🤖 AI Summary
Large language models (LLMs) lack embodied interaction experience with physical environments, hindering genuine embodied decision-making. To address this, we propose the first “Training Ground” paradigm for embodied decision-making—a scalable platform integrating a multi-agent data engine, a distributed heterogeneous hardware system, and a hierarchical reward architecture to support large-scale simulation, multi-agent collaboration, and fine-grained behavioral supervision. Our method unifies LLM-driven decision-making, high-fidelity physics-based simulation, multi-agent synthetic data generation, and multi-level reward modeling. Leveraging this framework, we train EmboBrain-7B, a 7-billion-parameter embodied reasoning model. On two embodied decision-making benchmarks, EmboBrain-7B outperforms the 671-billion-parameter DeepSeek-R1 by 9.5%, demonstrating the efficacy and scalability of environment-interaction-driven embodied capability acquisition.
📝 Abstract
Embodied decision-making enables agents to translate high-level goals into executable actions through continuous interactions within the physical world, forming a cornerstone of general-purpose embodied intelligence. Large language models (LLMs), with their general decision-making capabilities, offer a promising path to realize this potential; however, LLMs trained solely on language lack exposure to physical environments, limiting their true embodied understanding. To bridge this gap, we propose the concept of a training ground: a comprehensive infrastructure that provides task and scene simulation, embodied interaction, and feedback signals, offering a one-stop solution for LLM acquire genuine embodied decision-making skills. In this work, we present EmboMatrix, the first training ground of its kind, providing massive and diverse tasks with efficient simulation and precise rewards. EmboMatrix incorporates a series of novel techniques: a multi-agent data engine for large-scale task and scene generation, a distributed heterogeneous-hardware system for scalable simulation, and a multi-level reward architecture for precise supervision. Leveraging EmboMatrix, we cultivate EmboBrain, an LLM whose embodied decision-making abilities emerge from extensive embodied interactions. Experiments show that EmboBrain-7B surpasses the 671B DeepSeek-R1 baseline by 9.5% on two challenging embodied decision-making benchmarks, demonstrating the power of interactive, environment-grounded learning for building truly intelligent embodied agents.