ACE-Brain-0: Spatial Intelligence as a Shared Scaffold for Universal Embodiments

📅 2026-03-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of balancing generalization and domain specialization in universal embodied intelligence across heterogeneous platforms—such as autonomous vehicles, robots, and drones—which is often hindered by long-tailed data distributions, gradient interference, and catastrophic forgetting. The authors propose ACE-Brain-0, a framework that leverages spatial intelligence as a universal cognitive scaffold across diverse embodiment modalities. It introduces a Scaffold-Specialize-Reconcile (SSR) training paradigm: first establishing a shared spatial cognition foundation, then developing domain-specific expert models, and finally enabling data-free model fusion. Additionally, Group Relative Policy Optimization (GRPO) is incorporated to enhance holistic decision-making. Evaluated on 24 benchmarks spanning spatial and embodied intelligence, ACE-Brain-0 achieves state-of-the-art or superior performance, effectively harmonizing generalizability with task-specific expertise.

Technology Category

Application Category

📝 Abstract
Universal embodied intelligence demands robust generalization across heterogeneous embodiments, such as autonomous driving, robotics, and unmanned aerial vehicles (UAVs). However, existing embodied brain in training a unified model over diverse embodiments frequently triggers long-tail data, gradient interference, and catastrophic forgetting, making it notoriously difficult to balance universal generalization with domain-specific proficiency. In this report, we introduce ACE-Brain-0, a generalist foundation brain that unifies spatial reasoning, autonomous driving, and embodied manipulation within a single multimodal large language model~(MLLM). Our key insight is that spatial intelligence serves as a universal scaffold across diverse physical embodiments: although vehicles, robots, and UAVs differ drastically in morphology, they share a common need for modeling 3D mental space, making spatial cognition a natural, domain-agnostic foundation for cross-embodiment transfer. Building on this insight, we propose the Scaffold-Specialize-Reconcile~(SSR) paradigm, which first establishes a shared spatial foundation, then cultivates domain-specialized experts, and finally harmonizes them through data-free model merging. Furthermore, we adopt Group Relative Policy Optimization~(GRPO) to strengthen the model's comprehensive capability. Extensive experiments demonstrate that ACE-Brain-0 achieves competitive and even state-of-the-art performance across 24 spatial and embodiment-related benchmarks.
Problem

Research questions and friction points this paper is trying to address.

embodied intelligence
universal generalization
spatial intelligence
catastrophic forgetting
heterogeneous embodiments
Innovation

Methods, ideas, or system contributions that make the work stand out.

spatial intelligence
universal embodied intelligence
Scaffold-Specialize-Reconcile
multimodal large language model
model merging
🔎 Similar Papers
2024-07-09IEEE/ASME transactions on mechatronicsCitations: 94
Ziyang Gong
Ziyang Gong
SJTU, THU, Shanghai AI Lab (OpenGVLab), SYSU
Embodied Spatial Intelligence
Z
Zehang Luo
ACE Robotics
Anke Tang
Anke Tang
Ph.D Student, Wuhan University
Machine Learning
Z
Zhe Liu
ACE Robotics
S
Shi Fu
Nanyang Technological University
Zhi Hou
Zhi Hou
The University of Sydney
Computer VisionMachine Learning
Ganlin Yang
Ganlin Yang
University of Science and Technology of China && Shanghai AI Laboratory
Computer vision3D visionMultimodal models
Weiyun Wang
Weiyun Wang
Shanghai AI Laboratory; Fudan University
Vision-Language ModelMLLMFoundation Model
X
Xiaofeng Wang
ACE Robotics
Jianbo Liu
Jianbo Liu
University of Notre Dame
Gen Luo
Gen Luo
Shanghai AI Laboratory
computer visionvision and language
H
Haolan Kang
The University of Hong Kong
S
Shuang Luo
Nanyang Technological University
Yue Zhou
Yue Zhou
Associate Professor, East China Normal University
Remote Sensing Vision-Language ModelOriented Object Detection
Yong Luo
Yong Luo
Wuhan University
Artifical IntelligenceMachine LearningData MiningPattern Classification and Search
Li Shen
Li Shen
Associate Professor, Sun Yat-sen University
Machine LearningOptimization
Xiaosong Jia
Xiaosong Jia
Assistant Professor, Institute of Trustworthy Embodied AI (TEAI), Fudan University
Embodied AIAutonomous DrivingWorld ModelReinforcement Learning
Y
Yao Mu
Shanghai Jiao Tong University
X
Xue Yang
Shanghai Jiao Tong University
Chunxiao Liu
Chunxiao Liu
Senior Research Director, SenseTime
Autonomous DrivingPrediction/Decision/Planning/ControlReinforcement LearningAI Agent
Junchi Yan
Junchi Yan
FIAPR & ICML Board Member, SJTU (2018-), SII (2024-), AWS (2019-2022), IBM (2011-2018)
Computational IntelligenceAI4ScienceMachine LearningAutonomous Driving
Hengshuang Zhao
Hengshuang Zhao
The University of Hong Kong
Computer VisionMachine LearningArtificial Intelligence
Dacheng Tao
Dacheng Tao
Nanyang Technological University
artificial intelligencemachine learningcomputer visionimage processingdata mining
X
Xiaogang Wang
ACE Robotics