🤖 AI Summary
Existing embodied lifelong learning systems typically optimize isolated components—such as data collection or deployment—independently, hindering sustained improvement and cross-environment generalization. This paper introduces Arcadia, the first holistic framework that models embodied learning as an indivisible closed-loop lifecycle, encompassing four tightly coupled stages: autonomous exploration, generative scene reconstruction, shared multimodal representation learning, and simulation-driven evolution. Its key innovations include establishing the first sim-from-real feedback loop between physical and virtual domains, unifying self-evolving exploration, generative data augmentation, and a unified multimodal representation architecture. Arcadia enables reproducible, cross-task and cross-environment evaluation. Empirically, it achieves continuous performance gains on navigation and manipulation benchmarks and successfully transfers learned policies to real-world robots, demonstrating robustness and generalizability.
📝 Abstract
We contend that embodied learning is fundamentally a lifecycle problem rather than a single-stage optimization. Systems that optimize only one link (data collection, simulation, learning, or deployment) rarely sustain improvement or generalize beyond narrow settings. We introduce Arcadia, a closed-loop framework that operationalizes embodied lifelong learning by tightly coupling four stages: (1) Self-evolving exploration and grounding for autonomous data acquisition in physical environments, (2) Generative scene reconstruction and augmentation for realistic and extensible scene creation, (3) a Shared embodied representation architecture that unifies navigation and manipulation within a single multimodal backbone, and (4) Sim-from-real evaluation and evolution that closes the feedback loop through simulation-based adaptation. This coupling is non-decomposable: removing any stage breaks the improvement loop and reverts to one-shot training. Arcadia delivers consistent gains on navigation and manipulation benchmarks and transfers robustly to physical robots, indicating that a tightly coupled lifecycle: continuous real-world data acquisition, generative simulation update, and shared-representation learning, supports lifelong improvement and end-to-end generalization. We release standardized interfaces enabling reproducible evaluation and cross-model comparison in reusable environments, positioning Arcadia as a scalable foundation for general-purpose embodied agents.