🤖 AI Summary
This work addresses the prediction drift in existing navigation world models during multi-step rollouts, caused by inconsistent action conditioning, as well as the mismatch between training and few-step diffusion inference that degrades planning performance and deployment efficiency. To tackle these issues, the authors propose MWM, a novel world model featuring a two-stage training framework: first, structural pretraining, followed by post-training with Action Conditioning Consistency (ACC)—a mechanism explicitly integrated into the world model for the first time to enhance multi-step prediction consistency. Additionally, they introduce Inference-Consistent State Distillation (ICSD) to bridge the gap between full-step training and few-step inference. Experiments demonstrate that MWM significantly outperforms baseline methods in visual fidelity, trajectory accuracy, planning success rate, and inference efficiency.
📝 Abstract
World models enable planning in imagined future predicted space, offering a promising framework for embodied navigation. However, existing navigation world models often lack action-conditioned consistency, so visually plausible predictions can still drift under multi-step rollout and degrade planning. Moreover, efficient deployment requires few-step diffusion inference, but existing distillation methods do not explicitly preserve rollout consistency, creating a training-inference mismatch. To address these challenges, we propose MWM, a mobile world model for planning-based image-goal navigation. Specifically, we introduce a two-stage training framework that combines structure pretraining with Action-Conditioned Consistency (ACC) post-training to improve action-conditioned rollout consistency. We further introduce Inference-Consistent State Distillation (ICSD) for few-step diffusion distillation with improved rollout consistency. Our experiments on benchmark and real-world tasks demonstrate consistent gains in visual fidelity, trajectory accuracy, planning success, and inference efficiency. Code: https://github.com/AIGeeksGroup/MWM. Website: https://aigeeksgroup.github.io/MWM.