🤖 AI Summary
To address weak lifelong adaptability, poor scalability, and insufficient scheduling robustness in multi-robot systems—limitations arising from agent-centric, short-term memory architectures that hinder long-term learning, heterogeneous team scaling, and fault recovery—this paper proposes a memory-augmented collaborative framework. Its core innovation is Spatio-Temporal Embodied Memory (STEM), a unified representation integrating spatial structure, temporal events, and embodied features to enable global, cross-heterogeneous-robot shared memory and cerebrum-cerebellum hierarchical coordination. Coupled with a vision-language-action model and a hierarchical control architecture, the framework establishes a closed “cognition–memory–execution” loop. Evaluated in restaurant, supermarket, and home environments, it significantly improves task completion rate and collaboration efficiency, supports scalable deployment across >1,000 robots, sustains continuous operation for over 72 hours, and enables autonomous recovery from dynamic failures.
📝 Abstract
The proliferation of collaborative robots across diverse tasks and embodiments presents a central challenge: achieving lifelong adaptability, scalable coordination, and robust scheduling in multi-agent systems. Existing approaches, from vision-language-action (VLA) models to hierarchical frameworks, fall short due to their reliance on limited or dividual-agent memory. This fundamentally constrains their ability to learn over long horizons, scale to heterogeneous teams, or recover from failures, highlighting the need for a unified memory representation. To address these limitations, we introduce RoboOS-NeXT, a unified memory-based framework for lifelong, scalable, and robust multi-robot collaboration. At the core of RoboOS-NeXT is the novel Spatio-Temporal-Embodiment Memory (STEM), which integrates spatial scene geometry, temporal event history, and embodiment profiles into a shared representation. This memory-centric design is integrated into a brain-cerebellum framework, where a high-level brain model performs global planning by retrieving and updating STEM, while low-level controllers execute actions locally. This closed loop between cognition, memory, and execution enables dynamic task allocation, fault-tolerant collaboration, and consistent state synchronization. We conduct extensive experiments spanning complex coordination tasks in restaurants, supermarkets, and households. Our results demonstrate that RoboOS-NeXT achieves superior performance across heterogeneous embodiments, validating its effectiveness in enabling lifelong, scalable, and robust multi-robot collaboration. Project website: https://flagopen.github.io/RoboOS/