MWM: Mobile World Models for Action-Conditioned Consistent Prediction

📅 2026-03-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the prediction drift in existing navigation world models during multi-step rollouts, caused by inconsistent action conditioning, as well as the mismatch between training and few-step diffusion inference that degrades planning performance and deployment efficiency. To tackle these issues, the authors propose MWM, a novel world model featuring a two-stage training framework: first, structural pretraining, followed by post-training with Action Conditioning Consistency (ACC)—a mechanism explicitly integrated into the world model for the first time to enhance multi-step prediction consistency. Additionally, they introduce Inference-Consistent State Distillation (ICSD) to bridge the gap between full-step training and few-step inference. Experiments demonstrate that MWM significantly outperforms baseline methods in visual fidelity, trajectory accuracy, planning success rate, and inference efficiency.

Technology Category

Application Category

📝 Abstract
World models enable planning in imagined future predicted space, offering a promising framework for embodied navigation. However, existing navigation world models often lack action-conditioned consistency, so visually plausible predictions can still drift under multi-step rollout and degrade planning. Moreover, efficient deployment requires few-step diffusion inference, but existing distillation methods do not explicitly preserve rollout consistency, creating a training-inference mismatch. To address these challenges, we propose MWM, a mobile world model for planning-based image-goal navigation. Specifically, we introduce a two-stage training framework that combines structure pretraining with Action-Conditioned Consistency (ACC) post-training to improve action-conditioned rollout consistency. We further introduce Inference-Consistent State Distillation (ICSD) for few-step diffusion distillation with improved rollout consistency. Our experiments on benchmark and real-world tasks demonstrate consistent gains in visual fidelity, trajectory accuracy, planning success, and inference efficiency. Code: https://github.com/AIGeeksGroup/MWM. Website: https://aigeeksgroup.github.io/MWM.
Problem

Research questions and friction points this paper is trying to address.

world models
action-conditioned consistency
rollout drift
diffusion distillation
embodied navigation
Innovation

Methods, ideas, or system contributions that make the work stand out.

World Models
Action-Conditioned Consistency
Diffusion Distillation
Inference-Consistent State Distillation
Embodied Navigation
H
Han Yan
School of Computer Science, Peking University
Z
Zishang Xiang
School of Computer Science, Peking University
Zeyu Zhang
Zeyu Zhang
Gaoling School of Artificial Intelligence, Renmin University of China
LLM-based AgentResponsible RecSysCausal Learning
Hao Tang
Hao Tang
Peking University
computer vision