Gemini Robotics: Bringing AI into the Physical World

📅 2025-03-25
📈 Citations: 3
Influential: 0
📄 PDF
🤖 AI Summary
Deploying physical agents for embodied intelligence remains challenged by multimodal understanding, environmental generalization, and task transfer. This paper introduces a family of Vision-Language-Action (VLA) foundation models specifically designed for robotics, proposing the first robot-native architecture—Gemini Robotics-ER—an embodied reasoning model that tightly integrates 3D perception, spatiotemporal modeling, and cross-view correspondence learning. The method enables end-to-end, reactive robotic manipulation, supporting open-vocabulary instruction following, zero-shot generalization to unseen environments and objects, embodiment-agnostic transfer, and rapid adaptation to new tasks with only ~100 demonstrations. Evaluated on diverse real-world robotic platforms, it achieves high success rates in complex, long-horizon dexterous manipulation, exhibits strong robustness in previously unobserved scenes, enables fast short-horizon task acquisition, and incorporates safety mechanisms to ensure reliable physical interaction.

Technology Category

Application Category

📝 Abstract
Recent advancements in large multimodal models have led to the emergence of remarkable generalist capabilities in digital domains, yet their translation to physical agents such as robots remains a significant challenge. This report introduces a new family of AI models purposefully designed for robotics and built upon the foundation of Gemini 2.0. We present Gemini Robotics, an advanced Vision-Language-Action (VLA) generalist model capable of directly controlling robots. Gemini Robotics executes smooth and reactive movements to tackle a wide range of complex manipulation tasks while also being robust to variations in object types and positions, handling unseen environments as well as following diverse, open vocabulary instructions. We show that with additional fine-tuning, Gemini Robotics can be specialized to new capabilities including solving long-horizon, highly dexterous tasks, learning new short-horizon tasks from as few as 100 demonstrations and adapting to completely novel robot embodiments. This is made possible because Gemini Robotics builds on top of the Gemini Robotics-ER model, the second model we introduce in this work. Gemini Robotics-ER (Embodied Reasoning) extends Gemini's multimodal reasoning capabilities into the physical world, with enhanced spatial and temporal understanding. This enables capabilities relevant to robotics including object detection, pointing, trajectory and grasp prediction, as well as multi-view correspondence and 3D bounding box predictions. We show how this novel combination can support a variety of robotics applications. We also discuss and address important safety considerations related to this new class of robotics foundation models. The Gemini Robotics family marks a substantial step towards developing general-purpose robots that realizes AI's potential in the physical world.
Problem

Research questions and friction points this paper is trying to address.

Developing AI models for direct robot control in physical environments
Enhancing robustness to object variations and unseen environments
Enabling adaptation to novel robot embodiments and tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Vision-Language-Action generalist model for robots
Enhanced spatial and temporal understanding for robotics
Fine-tuning for long-horizon and novel robot tasks
🔎 Similar Papers
No similar papers found.
G
G. Team
Google DeepMind
Saminda Abeyruwan
Saminda Abeyruwan
University of Miami
Artificial IntelligenceAutonomous LearningMachine LearningReinforcement LearningSemantic Web
Joshua Ainslie
Joshua Ainslie
Google LLC
Machine Learning
Jean-Baptiste Alayrac
Jean-Baptiste Alayrac
DeepMind, London
Computer VisionMachine LearningArtifical intelligence
M
Montse Gonzalez Arenas
Google DeepMind
T
Travis Armstrong
Google DeepMind
Ashwin Balakrishna
Ashwin Balakrishna
Physical Intelligence
RoboticsMachine LearningReinforcement LearningImitation Learning
R
Robert Baruch
Google DeepMind
M
Maria Bauzá
Google DeepMind
M
M. Blokzijl
Google DeepMind
Steven Bohez
Steven Bohez
Google DeepMind
deep learningreinforcement learningrobotics
Konstantinos Bousmalis
Konstantinos Bousmalis
DeepMind
Machine LearningComputer VisionRobotics
A
Anthony Brohan
Google DeepMind
T
Thomas Buschmann
Google DeepMind
Arunkumar Byravan
Arunkumar Byravan
Google DeepMind
Serkan Cabi
Serkan Cabi
Google DeepMind
Ken Caluwaerts
Ken Caluwaerts
Google DeepMind
RoboticsReinforcement LearningMachine LearningLegged LocomotionTensegrity
F
Federico Casarini
Google DeepMind
O
Oscar Chang
Google DeepMind
J
José Enrique Chen
Google DeepMind
X
Xi Chen
Google DeepMind
H
Hao-Tien Lewis Chiang
Google DeepMind
Krzysztof Choromanski
Krzysztof Choromanski
Google DeepMind Robotics & Columbia University
roboticsreinforcement learningefficient Transformersquasi Monte Carlo methods
D
David D'Ambrosio
Google DeepMind
Sudeep Dasari
Sudeep Dasari
Google DeepMind
Robotic LearningUnsupervised LearningComputer Vision
Todor Davchev
Todor Davchev
Research Scientist, Google DeepMind
Machine LearningRobot LearningReinforcement Learning
Coline Devin
Coline Devin
DeepMind
Artificial IntelligenceMachine LearningReinforcement Learning
Norman Di Palo
Norman Di Palo
Google DeepMind
Robot LearningImitation LearningReinforcement Learning
T
Tianli Ding
Google DeepMind
A
Adil Dostmohamed
Google DeepMind
Danny Driess
Danny Driess
Google DeepMind
Machine LearningRobotics
Yilun Du
Yilun Du
Harvard University
Artificial IntelligenceMachine LearningRoboticsComputer Vision
Debidatta Dwibedi
Debidatta Dwibedi
Google Deepmind
Artificial IntelligenceComputer VisionMachine LearningReinforcement LearningImitation
M
Michael Elabd
Google DeepMind
Claudio Fantacci
Claudio Fantacci
Robotics Research Engineer @ Google DeepMind
Deep learningRobotics learningLarge modelsLarge scale trainingKalman and Particle filters
C
Cody Fong
Google DeepMind
E
Erik Frey
Google DeepMind
Chuyuan Fu
Chuyuan Fu
Google DeepMind
RoboticsSimulationComputer GraphicsSolid and Fluid Mechanics
M
M. Giustina
Google DeepMind
K. Gopalakrishnan
K. Gopalakrishnan
Google DeepMind
L
L. Graesser
Google DeepMind
Leonard Hasenclever
Leonard Hasenclever
Research Scientist at DeepMind
Machine LearningReinforcement LearningStatistics
N
N. Heess
Google DeepMind
B
Brandon Hernaez
Google DeepMind
A
Alex Herzog
Google DeepMind
R
R. A. Hofer
Google DeepMind
Jan Humplik
Jan Humplik
Research Scientist, DeepMind
AIRobotics
Atil Iscen
Atil Iscen
Google
RoboticsReinforcement LearningEvolutionary AlgorithmsMulti-agent Learning
M
M. Jacob
Google DeepMind
Deepali Jain
Deepali Jain
Google Deepmind
Artificial IntelligenceRoboticsReinforcement Learning
Ryan C. Julian
Ryan C. Julian
Google DeepMind
RoboticsMachine LearningImitation LearningReinforcement Learning
Dmitry Kalashnikov
Dmitry Kalashnikov
Google
RoboticsMachine LearningReinforcement Learning
M
M. E. Karagozler
Google DeepMind
S
Stefani Karp
Google DeepMind
C
Chase Kew
Google DeepMind
J
Jerad Kirkland
Google DeepMind
Sean Kirmani
Sean Kirmani
Google DeepMind
Artificial IntelligenceNeural NetworksComputer VisionRobotics
Y
Yuheng Kuang
Google DeepMind
Thomas Lampe
Thomas Lampe
DeepMind
A
Antoine Laurens
Google DeepMind
I
Isabel Leal
Google DeepMind
Alex X. Lee
Alex X. Lee
Research Scientist, Google DeepMind
Artificial IntelligenceRoboticsComputer Vision
T
T. Lee
Google DeepMind
Jacky Liang
Jacky Liang
Google DeepMind
RoboticsFoundation Models
Yixin Lin
Yixin Lin
DeepMind
S
Sharath Maddineni
Google DeepMind
Anirudha Majumdar
Anirudha Majumdar
Associate Professor, Princeton University & Visiting Research Scientist, Google DeepMind
RoboticsMachine LearningMotion PlanningControl
A
A. Michaely
Google DeepMind
R
Robert Moreno
Google DeepMind
M
M. Neunert
Google DeepMind
F
Francesco Nori
Google DeepMind
C
Carolina Parada
Google DeepMind
Emilio Parisotto
Emilio Parisotto
Carnegie Mellon University
Peter Pastor
Peter Pastor
Roboticist, Google
Mobile ManipulationHumanoid RoboticsMachine LearningMotor Primitives
A
A. Pooley
Google DeepMind
Kanishka Rao
Kanishka Rao
Software Engineer, Google
deep learning
K
Krista Reymann
Google DeepMind
Dorsa Sadigh
Dorsa Sadigh
Stanford University
RoboticsHuman-Robot InteractionMachine LearningArtificial IntelligenceControl Theory
S
Stefano Saliceti
Google DeepMind
P
Pannag R. Sanketi
Google DeepMind
P
P. Sermanet
Google DeepMind
Dhruv Shah
Dhruv Shah
Princeton University, Google DeepMind
Robot LearningArtificial IntelligenceRoboticsReinforcement Learning
M
Mohit Sharma
Google DeepMind
K
Kathryn Shea
Google DeepMind
C
Charles Shu
Google DeepMind
Vikas Sindhwani
Vikas Sindhwani
Google DeepMind Robotics
AIRoboticsAI SafetyMachine LearningOptimization
S
Sumeet Singh
Google DeepMind
Radu Soricut
Radu Soricut
Distinguished Scientist at Google
VLMsMultimodal Modeling
Jost Tobias Springenberg
Jost Tobias Springenberg
Google DeepMind
Machine Learning
Rachel Sterneck
Rachel Sterneck
Google DeepMind
R
Razvan Surdulescu
Google DeepMind
Jie Tan
Jie Tan
Google DeepMind
Artificial General IntelligenceRoboticsFoundation ModelComputer GraphicsVision
Jonathan Tompson
Jonathan Tompson
Meta Reality Labs
Computer Science
Vincent Vanhoucke
Vincent Vanhoucke
Distinguished Engineer, Waymo
Robot LearningRoboticsMachine LearningComputer VisionArtificial Intelligence
J
Jake Varley
Google DeepMind
Grace Vesom
Grace Vesom
Google DeepMind
Computer VisionMachine LearningShape Representation
G
Giulia Vezzani
Google DeepMind
O
O. Vinyals
Google DeepMind
A
Ayzaan Wahid
Google DeepMind
S
Stefan Welker
Google DeepMind
Paul Wohlhart
Paul Wohlhart
Everyday Robots
Artificial IntelligenceComputer VisionRobotics
F
Fei Xia
Google DeepMind
Ted Xiao
Ted Xiao
Staff Research Scientist, Google DeepMind
Deep LearningArtificial IntelligenceRoboticsReinforcement LearningControl Theory
A
Annie Xie
Google DeepMind
J
Jinyu Xie
Google DeepMind
P
Peng Xu
Google DeepMind
S
Sichun Xu
Google DeepMind
Y
Ying Xu
Google DeepMind
Zhuo Xu
Zhuo Xu
Wuhan University
Multi-sensor fusion positioningvisual SLAM
Y
Yuxiang Yang
Google DeepMind
R
Rui Yao
Google DeepMind
S
Sergey Yaroshenko
Google DeepMind
W
Wenhao Yu
Google DeepMind
Wentao Yuan
Wentao Yuan
Google DeepMind
3D Computer VisionRobotics
J
Jingwei Zhang
Google DeepMind
Tingnan Zhang
Tingnan Zhang
Google DeepMind
foundation modelsreinforcement learningroboticslocomotion
Allan Zhou
Allan Zhou
Google DeepMind
Machine Learning
Yuxiang Zhou
Yuxiang Zhou
Postdoctoral Researcher, Queen Mary University of London
Natural Language ProcessingLarge Language Model