🤖 AI Summary
This work addresses a key limitation in existing latent world models for autonomous driving: the lack of explicit utilization of future states in trajectory planning, which leads to undesirable coupling between current and future features in the latent space. To overcome this, the authors propose DriveFuture, a novel framework that, for the first time, explicitly conditions planning on predicted future latent states. During training, cross-attention mechanisms refine the current latent state using future predictions, providing forward-looking guidance to a diffusion-based trajectory planner; during inference, planning is directly conditioned on these predicted future latent states. This approach enables planning-oriented, forward-looking modeling and achieves state-of-the-art performance, attaining 55.5, 89.9, and 90.7 EPDMS/PDMS on NAVSIM-v2 navhard, NAVSIM-v2 navtest, and NAVSIM-v1 navtest benchmarks, respectively—surpassing prior methods and securing top rankings across multiple leaderboards.
📝 Abstract
Existing latent world models for autonomous driving have opened a promising path toward future-aware driving intelligence. However, they typically treat future latent states as prediction targets or auxiliary signals, rather than directly conditioning trajectory planning. This can entangle current and future features in latent space. In this work, we propose DriveFuture, a future-aware latent world modeling framework for autonomous driving that explicitly learns planning-oriented foresight by conditioning the current latent state modeling process on future world states. Specifically, during training, the model first predicts future latent world states from the current latent state and ego action, and then refines the prediction against the ground-truth future latent state via cross-attention. The resulting future-aware latent serves as an explicit condition for a diffusion-based trajectory planner. During inference, DriveFuture conditions on the predicted future latent state instead of the ground-truth future state. DriveFuture achieves SOTA performance on the public NAVSIM benchmarks, reaching \textbf{55.5} EPDMS on NAVSIM-v2 {\textcolor{blue}{\textit{navhard}}}, \textbf{89.9} EPDMS on NAVSIM-v2 {\textcolor{blue}{\textit{navtest}}}, and \textbf{90.7} PDMS on NAVSIM-v1 {\textcolor{blue}{\textit{navtest}}}, respectively. These results suggest that the key to latent world modeling lies not merely in simulating future states, but more importantly in conditioning current decision-making on future states. Notably, as of April 2026, DriveFuture ranks \textbf{1st} on the \href{https://huggingface.co/spaces/AGC2025/e2e-driving-navhard}{NAVSIM-v2 {\textcolor{blue}{\textit{navhard}}}} leaderboard and achieves SOTA performance on \href{https://huggingface.co/spaces/AGC2024-P/e2e-driving-navtest}{NAVSIM-v1 {\textcolor{blue}{\textit{navtest}}}}.