🤖 AI Summary
To address poor generalization, challenging Sim2Real transfer, and difficulty in cross-morphology adaptation for general-purpose navigation in cluttered environments, this paper proposes X-Mobility—a fully end-to-end, generalizable navigation framework. Methodologically, it introduces three key innovations: (1) a novel decoupled architecture separating the world model from the policy network; (2) an autoregressive latent-space world model coupled with a multi-task, multi-head decoder to learn navigation-critical latent state representations; and (3) an offline-online co-training paradigm enabling zero-shot Sim2Real transfer and cross-embodiment generalization. Evaluated on multiple benchmarks, X-Mobility significantly outperforms state-of-the-art methods, achieving high navigation success rates in unseen environments, under novel sensor configurations, and across diverse robot morphologies—without fine-tuning. It is the first framework to enable direct, end-to-end deployment from simulation to real-world robots without adaptation.
📝 Abstract
General-purpose navigation in challenging environments remains a significant problem in robotics, with current state-of-the-art approaches facing myriad limitations. Classical approaches struggle with cluttered settings and require extensive tuning, while learning-based methods face difficulties generalizing to out-of-distribution environments. This paper introduces X-Mobility, an end-to-end generalizable navigation model that overcomes existing challenges by leveraging three key ideas. First, X-Mobility employs an auto-regressive world modeling architecture with a latent state space to capture world dynamics. Second, a diverse set of multi-head decoders enables the model to learn a rich state representation that correlates strongly with effective navigation skills. Third, by decoupling world modeling from action policy, our architecture can train effectively on a variety of data sources, both with and without expert policies: off-policy data allows the model to learn world dynamics, while on-policy data with supervisory control enables optimal action policy learning. Through extensive experiments, we demonstrate that X-Mobility not only generalizes effectively but also surpasses current state-of-the-art navigation approaches. Additionally, X-Mobility also achieves zero-shot Sim2Real transferability and shows strong potential for cross-embodiment generalization.