🤖 AI Summary
This work addresses the perceptual and geometric drift inherent in conventional rollout-based navigation world models by proposing an anchor-guided, segment-wise generation paradigm. The approach first predicts sparse future anchor states to serve as long-term stable targets, then synthesizes intermediate frames conditioned on both historical context and these anchors. Crucially, it incorporates bidirectional epipolar geometric constraints to explicitly enforce spatial consistency across generated views. This design effectively mitigates error accumulation and motion inconsistency commonly observed in recursive generation. Evaluated on four standard benchmarks, the method significantly outperforms strong baselines, yielding notable improvements in long-horizon visual fidelity, geometric coherence, multi-view consistency, and downstream planning performance.
📝 Abstract
We propose Drift-Resistant Navigation World Model, a generative model that mitigates both perceptual drift and geometric drift in conventional rollout-based navigation world models. Existing methods recursively feed generated content into subsequent steps, causing noise accumulation and degraded predictions, i.e., perceptual drift. Meanwhile, their predictions often deviate from the agent's motion, resulting in geometry drift. We address both types of drift by redesigning world-model prediction as an anchor-guided rollout. Instead of rolling out every frame sequentially, we first predict sparse future anchors that serve as stable long-range targets, and then generate intermediate frames within each chunk conditioned on both past context and future anchors. Importantly, these sparse anchors also provide geometric constraints, supported by bidirectional epipolar geometry, to localize where corresponding content should appear in the intermediate frames. Experiments on four benchmarks demonstrate consistent improvements over strong baselines in long-horizon visual quality, geometric consistency, and multi-view coherence. These gains further translate into improved downstream planning performance under the same planners, highlighting the importance of drift-resistant, geometry-aware prediction for reliable navigation world models.