MiMo-Embodied: X-Embodied Foundation Model Technical Report

📅 2025-11-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper introduces the first unified foundation model bridging embodied AI and autonomous driving, addressing the long-standing knowledge transfer bottleneck caused by their disciplinary isolation. Methodologically, we propose a multi-stage collaborative learning framework, construct a high-quality cross-domain dataset, and jointly fine-tune the model via chain-of-thought (CoT) reasoning and reinforcement learning (RL) to enable bidirectional positive transfer. Our core contribution is the first semantic and decision-level alignment mechanism between the two domains, yielding significant complementary gains in task planning, affordance prediction, spatial understanding, state estimation, and driving policy generation. Extensive experiments across 17 embodied AI and 12 autonomous driving benchmarks demonstrate consistent superiority over state-of-the-art open-source, closed-source, and domain-specific models, validating both cross-domain generalization capability and mutual performance enhancement.

Technology Category

Application Category

📝 Abstract
We open-source MiMo-Embodied, the first cross-embodied foundation model to successfully integrate and achieve state-of-the-art performance in both Autonomous Driving and Embodied AI. MiMo-Embodied sets new records across 17 embodied AI benchmarks in Task Planning, Affordance Prediction and Spatial Understanding, while also excelling in 12 autonomous driving benchmarks across Environmental Perception, Status Prediction, and Driving Planning. Across these tasks, MiMo-Embodied significantly outperforms existing open-source, closed-source, and specialized baselines. Our results indicate that through multi-stage learning, curated data construction, and CoT/RL fine-tuning, these two domains exhibit strong positive transfer and mutually reinforce one another. We provide a detailed analysis of our model design and training methodologies to facilitate further research. Code and models are available at https://github.com/XiaomiMiMo/MiMo-Embodied.
Problem

Research questions and friction points this paper is trying to address.

Integrates autonomous driving and embodied AI into a unified foundation model
Achieves state-of-the-art performance across 29 diverse benchmarks
Demonstrates positive transfer between embodied AI and autonomous driving
Innovation

Methods, ideas, or system contributions that make the work stand out.

Cross-embodied foundation model integrates autonomous driving
Multi-stage learning with curated data construction
CoT and RL fine-tuning for positive transfer
🔎 Similar Papers
No similar papers found.
Xiaoshuai Hao
Xiaoshuai Hao
Beijing Academy of Artificial Intelligence,BAAI
vision and language
L
Lei Zhou
Xiaomi Embodied Intelligence Team
Zhijian Huang
Zhijian Huang
Biochemistry Department and Beckman Institute, University of Illinois at Urbana-Champaigh
modeling and simulationquantum chemistrymembrane transporters and channels
Z
Zhiwen Hou
Xiaomi Embodied Intelligence Team
Yingbo Tang
Yingbo Tang
Institute of Automation,Chinese Academy of Sciences
Lingfeng Zhang
Lingfeng Zhang
PhD student at Tsinghua University
embodied ai
Guang Li
Guang Li
Assistant Professor, Hokkaido University
Dataset DistillationSelf-Supervised LearningData-Centric AIMedical Image Analysis
Z
Zheng Lu
Xiaomi Embodied Intelligence Team
Shuhuai Ren
Shuhuai Ren
Peking University
Deep LearningNatural Language Processing
X
Xianhui Meng
Xiaomi Embodied Intelligence Team
Y
Yuchen Zhang
Xiaomi Embodied Intelligence Team
J
Jing Wu
Xiaomi Embodied Intelligence Team
Jinghui Lu
Jinghui Lu
ByteDance Inc., School of Computer Science, University College Dublin
Natural Language ProcessingMulti-ModalityLLMHuman-in-the-loop Learning
Chenxu Dang
Chenxu Dang
Huazhong University of Science and Technology
Computer VisionAutonomous Driving
Jiayi Guan
Jiayi Guan
Xiaomi Embodied Intelligence Team
J
Jianhua Wu
Xiaomi Embodied Intelligence Team
Z
Zhiyi Hou
Xiaomi Embodied Intelligence Team
H
Hanbing Li
Xiaomi Embodied Intelligence Team
S
Shumeng Xia
Xiaomi Embodied Intelligence Team
M
Mingliang Zhou
Xiaomi Embodied Intelligence Team
Yinan Zheng
Yinan Zheng
Tsinghua University
Reinforcement LearningDiffusion ModelsAutonomous DrivingRobotics
Zihao Yue
Zihao Yue
Renmin University of China
Multimodal AILanguage Modeling
Shuhao Gu
Shuhao Gu
Xiaomi
LLMVision-Language ModelAGI
H
Hao Tian
Xiaomi Embodied Intelligence Team
Y
Yuannan Shen
Xiaomi Embodied Intelligence Team