Google DeepMind has introduced the next generation of its gaming-focused artificial intelligence agent, known as the Scalable Instructable Multiworld Agent, or SIMA 2, on Thursday. The upgraded system builds on the first version launched in March 2024 and brings notable gains in reasoning, adaptability and user interaction. The company says the agent learns continuously and becomes more capable through its own play.
How SIMA 2 works?
In its announcement, DeepMind highlighted that SIMA 2 can now reflect on its actions and think through the steps required to complete a task. The agent is powered by Google’s Gemini models and is designed to follow human-issued instructions, understand what has been asked, and plan its next moves based on the virtual environment it sees on screen.
The system receives visual input from a three-dimensional game world along with a user-defined goal, such as “build a shelter” or “find the red house”. It then breaks that goal into a sequence of smaller actions and performs them using controls similar to a keyboard and mouse.
What can SIMA 2 do?
According to the company, one of the most significant advances is SIMA 2’s improved ability to operate in games it has not previously encountered. DeepMind tested the agent in new environments such as Minedojo, a research adaptation of Minecraft, and ASKA, a Viking-themed survival game. In both cases, SIMA 2 delivered higher success rates than the earlier version.
The system also handles multimodal prompts, including sketches, emojis and a range of languages. It can apply concepts learned in one game to another. For example, an understanding of mining in a sandbox world can help it grasp harvesting in a different survival setting.
How is SIMA 2 trained?
Google states that the second-generation agent is trained using a mix of human demonstration data and automatically generated annotations from the Gemini models. When SIMA 2 learns a new movement or skill in a fresh environment, that experience is captured and fed back into the training pipeline. DeepMind says this reduces dependence on human-labelled examples and allows the agent to refine itself over time.
What are the limitations of SIMA 2?
Despite the progress, the system still has notable limitations. Memory of past interactions is restricted, long-range reasoning that requires many steps is difficult, and precise low-level control similar to robotic joint movements is not addressed in the current framework.
A path toward real-world robotics
DeepMind stresses that SIMA 2 is not intended as a gaming assistant. Instead, the company views three-dimensional game worlds as a useful testing ground for AI agents that could eventually control real-world robots. The broader objective is to develop general-purpose machines that can follow natural language instructions and handle a variety of tasks in complex physical settings, highlights Google.
