🤖 AI Summary
Existing world models for autonomous driving struggle to accurately capture the evolution of dynamic scenes under action conditioning, leading to unreliable planning. This work proposes an action-conditioned velocity field modeling approach based on rectified flow, which progressively predicts future states in latent space. To enhance planning robustness, we further introduce a stability-aware multimodal trajectory evaluation strategy. Our method overcomes limitations inherent in conventional generative or regression paradigms for dynamic scene modeling and achieves significant performance gains across multiple planning frameworks on the nuScenes and NavSim benchmarks, without incurring additional inference overhead.
📝 Abstract
Recently, world models have been incorporated into the autonomous driving systems to improve the planning reliability. Existing approaches typically predict future states through appearance generation or deterministic regression, which limits their ability to capture trajectory-conditioned scene evolution and leads to unreliable action planning. To address this, we propose DynFlowDrive, a latent world model that leverages flow-based dynamics to model the transition of world states under different driving actions. By adopting the rectifiedflow formulation, the model learns a velocity field that describes how the scene state changes under different driving actions, enabling progressive prediction of future latent states. Building upon this, we further introduce a stability-aware multi-mode trajectory selection strategy that evaluates candidate trajectories according to the stability of the induced scene transitions. Extensive experiments on the nuScenes and NavSim benchmarks demonstrate consistent improvements across diverse driving frameworks without introducing additional inference overhead. Source code will be abaliable at https://github.com/xiaolul2/DynFlowDrive.