Gemini Robotics 1.5: Pushing the Frontier of Generalist Robots with Advanced Embodied Reasoning, Thinking, and Motion Transfer

📅 2025-10-02

📈 Citations: 0

✨ Influential: 0

career value

231K/year

🤖 AI Summary

General-purpose robots face core challenges including shallow physical understanding, weak reasoning capabilities, and poor cross-platform control generalization. To address these, we propose a multimodal embodied intelligence framework integrating multimodal vision-language-action (VLA) models, embodied reasoning (ER) models, and motion transfer (MT) mechanisms to enable joint learning across heterogeneous robotic platforms. We innovatively introduce a natural-language-driven, hierarchical internal reasoning module that supports explicit “think-then-act” task decomposition and planning. This enhances spatial modeling fidelity and long-horizon task success rates. Experiments demonstrate a 32.7% improvement in success rate on complex multi-step manipulation tasks, alongside significantly improved behavioral interpretability. Our approach advances the integration of perception, reasoning, and execution toward practical deployment.

Technology Category

Application Category

📝 Abstract

General-purpose robots need a deep understanding of the physical world, advanced reasoning, and general and dexterous control. This report introduces the latest generation of the Gemini Robotics model family: Gemini Robotics 1.5, a multi-embodiment Vision-Language-Action (VLA) model, and Gemini Robotics-ER 1.5, a state-of-the-art Embodied Reasoning (ER) model. We are bringing together three major innovations. First, Gemini Robotics 1.5 features a novel architecture and a Motion Transfer (MT) mechanism, which enables it to learn from heterogeneous, multi-embodiment robot data and makes the VLA more general. Second, Gemini Robotics 1.5 interleaves actions with a multi-level internal reasoning process in natural language. This enables the robot to "think before acting" and notably improves its ability to decompose and execute complex, multi-step tasks, and also makes the robot's behavior more interpretable to the user. Third, Gemini Robotics-ER 1.5 establishes a new state-of-the-art for embodied reasoning, i.e., for reasoning capabilities that are critical for robots, such as visual and spatial understanding, task planning, and progress estimation. Together, this family of models takes us a step towards an era of physical agents-enabling robots to perceive, think and then act so they can solve complex multi-step tasks.

Problem

Research questions and friction points this paper is trying to address.

Developing generalist robots with advanced embodied reasoning capabilities

Enhancing robot performance through multi-level internal reasoning processes

Enabling motion transfer across heterogeneous multi-embodiment robot systems

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-embodiment VLA model with Motion Transfer mechanism

Interleaves actions with multi-level internal reasoning process

State-of-the-art Embodied Reasoning for visual and spatial understanding

🔎 Similar Papers

What Foundation Models can Bring for Robot Learning in Manipulation : A Survey