
Google DeepMind has launched Gemini Robotics, a suite of AI models that enables robots to perform complex physical tasks with unprecedented adaptability and dexterity.
Key Points
- Gemini Robotics combines vision, language, and action to improve how robots interact with the world.
- The models allow robots to complete complex tasks and adapt to new environments without specific training.
- DeepMind is collaborating with Apptronik to integrate these AI models into humanoid robots.
- Safety measures are built into the models to ensure responsible and reliable deployment.
Robots that understand language, see the world, and act with precision have long been a dream of AI researchers. Google DeepMind’s latest initiative, Gemini Robotics, pushes that dream closer to reality. Built on the foundation of Gemini 2.0, these AI models bring advanced reasoning, adaptability, and dexterity to robots—allowing them to fold origami, pack lunch boxes, and engage in dynamic human interactions.
Google DeepMind's new models, Gemini Robotics and Gemini Robotics-ER, represent what the company calls a foundation for "bringing AI into the physical world" by enabling robots to understand and interact with their surroundings in more sophisticated ways than previously possible.
The primary model, Gemini Robotics, is described as an advanced vision-language-action (VLA) system that adds physical actions as a new output modality to the existing Gemini 2.0 framework. Its counterpart, Gemini Robotics-ER, focuses on enhanced spatial understanding and embodied reasoning capabilities to help roboticists run their own programs with improved performance.
"We've always thought of robotics as a helpful testing ground for translating AI advances into the physical world," Google CEO Sundar Pichai noted in post on X. "Today we're taking our next step in this journey with our newest Gemini 2.0 robotics models."
According to Google DeepMind, the new models excel in three critical areas: generality, interactivity, and dexterity. The company claims Gemini Robotics "more than doubles performance on a comprehensive generalization benchmark compared to other state-of-the-art vision-language-action models," allowing robots to handle novel situations and objects without specific training.
In demonstration videos, the models enable robots to perform delicate tasks like folding origami, packing lunch items into ziplock bags, and adjusting to changing environments in real-time. The robots can follow natural language instructions in conversational terms and adapt when objects move or plans change.
The dexterity demonstrated in these tasks represents a significant leap forward in robotics. Many everyday activities that humans perform without thinking require fine motor control that has traditionally challenged robotic systems. The videos show robots successfully manipulating small objects with precision, suggesting practical applications may be closer than previously thought.
Google DeepMind is partnering with Apptronik to integrate these models into humanoid robots, while also making Gemini Robotics-ER available to trusted testers including Agile Robots, Agility Robots, Boston Dynamics, and Enchanted Tools.
This development comes amid growing competition in the AI-powered robotics field. Several major technology companies and startups have been working to translate large language model capabilities into physical form factors, though most consumer-facing applications remain in early stages.
Google DeepMind emphasizes it's taking a "layered, holistic approach" to safety, incorporating traditional robotics safeguards while leveraging Gemini's core safety features. The company also announced a new dataset called ASIMOV to help researchers measure the safety implications of robotic actions in real-world scenarios.
What sets Gemini Robotics apart is its versatility across different robot types. While trained primarily on bi-arm robotic platforms, the company demonstrated the technology working on various robot configurations, including Apptronik's humanoid Apollo robot.
The potential applications for these models span both consumer and industrial settings, though Google DeepMind hasn't announced specific commercial products or timeline for wider availability beyond its current testing partnerships.
The introduction of these models underscores a growing trend of AI companies moving beyond purely digital applications to create systems that can interact with and manipulate the physical world. For Google DeepMind, which has primarily focused on research and digital AI applications, this represents a significant expansion of its technological ambitions.
While Gemini Robotics represents a significant advancement, its still very early days, and many challenges remain in refining robotic dexterity, real-time decision-making, and broader generalization. However, this release lays the groundwork for AI-driven robots that can assist in homes, workplaces, and beyond.