🤖 AI Summary
This work addresses the high cost and risk of real-world reinforcement learning for autonomous driving, where existing pixel-level diffusion-based world models suffer from prohibitive inference latency (~2 seconds per frame), hindering high-frequency interaction. To overcome this, the authors propose DreamerAD, a latent-space world model featuring three key innovations: shortcut forcing via recursive multi-resolution step compression, a latent-representation-based autoregressive dense reward model, and Gaussian vocabulary sampling tailored for GRPO. These mechanisms collectively reduce diffusion sampling from 100 steps to a single step—yielding an 80× speedup—while preserving visual interpretability. Evaluated on NavSim v2, DreamerAD achieves a state-of-the-art 87.7 EPDMS, establishing a new performance benchmark and demonstrating the efficacy and practicality of latent-space reinforcement learning for autonomous driving.
📝 Abstract
We introduce DreamerAD, the first latent world model framework that enables efficient reinforcement learning for autonomous driving by compressing diffusion sampling from 100 steps to 1 - achieving 80x speedup while maintaining visual interpretability. Training RL policies on real-world driving data incurs prohibitive costs and safety risks. While existing pixel-level diffusion world models enable safe imagination-based training, they suffer from multi-step diffusion inference latency (2s/frame) that prevents high-frequency RL interaction. Our approach leverages denoised latent features from video generation models through three key mechanisms: (1) shortcut forcing that reduces sampling complexity via recursive multi-resolution step compression, (2) an autoregressive dense reward model operating directly on latent representations for fine-grained credit assignment, and (3) Gaussian vocabulary sampling for GRPO that constrains exploration to physically plausible trajectories. DreamerAD achieves 87.7 EPDMS on NavSim v2, establishing state-of-the-art performance and demonstrating that latent-space RL is effective for autonomous driving.