X-MOBILITY: End-To-End Generalizable Navigation via World Modeling

📅 2024-10-23

🏛️ arXiv.org

📈 Citations: 2

✨ Influential: 0

career value

215K/year

🤖 AI Summary

To address poor generalization, challenging Sim2Real transfer, and difficulty in cross-morphology adaptation for general-purpose navigation in cluttered environments, this paper proposes X-Mobility—a fully end-to-end, generalizable navigation framework. Methodologically, it introduces three key innovations: (1) a novel decoupled architecture separating the world model from the policy network; (2) an autoregressive latent-space world model coupled with a multi-task, multi-head decoder to learn navigation-critical latent state representations; and (3) an offline-online co-training paradigm enabling zero-shot Sim2Real transfer and cross-embodiment generalization. Evaluated on multiple benchmarks, X-Mobility significantly outperforms state-of-the-art methods, achieving high navigation success rates in unseen environments, under novel sensor configurations, and across diverse robot morphologies—without fine-tuning. It is the first framework to enable direct, end-to-end deployment from simulation to real-world robots without adaptation.

Technology Category

Application Category

📝 Abstract

General-purpose navigation in challenging environments remains a significant problem in robotics, with current state-of-the-art approaches facing myriad limitations. Classical approaches struggle with cluttered settings and require extensive tuning, while learning-based methods face difficulties generalizing to out-of-distribution environments. This paper introduces X-Mobility, an end-to-end generalizable navigation model that overcomes existing challenges by leveraging three key ideas. First, X-Mobility employs an auto-regressive world modeling architecture with a latent state space to capture world dynamics. Second, a diverse set of multi-head decoders enables the model to learn a rich state representation that correlates strongly with effective navigation skills. Third, by decoupling world modeling from action policy, our architecture can train effectively on a variety of data sources, both with and without expert policies: off-policy data allows the model to learn world dynamics, while on-policy data with supervisory control enables optimal action policy learning. Through extensive experiments, we demonstrate that X-Mobility not only generalizes effectively but also surpasses current state-of-the-art navigation approaches. Additionally, X-Mobility also achieves zero-shot Sim2Real transferability and shows strong potential for cross-embodiment generalization.

Problem

Research questions and friction points this paper is trying to address.

General-purpose navigation in challenging environments

Overcoming limitations of classical and learning-based methods

Achieving zero-shot Sim2Real transferability and cross-embodiment generalization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Auto-regressive world modeling with latent state space

Multi-head decoders for rich state representation

Decoupled world modeling and action policy training

🔎 Similar Papers

Vision-and-Language Navigation Today and Tomorrow: A Survey in the Era of Foundation Models