LAP: Fast LAtent Diffusion Planner with Fine-Grained Feature Distillation for Autonomous Driving

📅 2025-11-29

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

Diffusion models for autonomous driving planning suffer from high latency due to iterative sampling and weak high-level semantic representation when modeling trajectories directly in the raw space—often collapsing into low-level kinematic patterns. To address this, we propose a latent-space single-step denoising planning framework. First, a disentangled VAE constructs a low-dimensional planning latent space that explicitly separates semantic intent from motion dynamics. Second, a single-step diffusion denoiser operates within this latent space, eliminating the need for iterative sampling. Third, a fine-grained scene feature distillation mechanism is introduced to explicitly align high-level planning decisions with contextual semantic cues. Evaluated in closed-loop nuPlan benchmarks, our method achieves state-of-the-art performance among learning-based planners. It accelerates inference by up to 10× over prior diffusion-based approaches, while preserving multimodality, planning efficiency, and semantic consistency.

Technology Category

Application Category

📝 Abstract

Diffusion models have demonstrated strong capabilities for modeling human-like driving behaviors in autonomous driving, but their iterative sampling process induces substantial latency, and operating directly on raw trajectory points forces the model to spend capacity on low-level kinematics, rather than high-level multi-modal semantics. To address these limitations, we propose LAtent Planner (LAP), a framework that plans in a VAE-learned latent space that disentangles high-level intents from low-level kinematics, enabling our planner to capture rich, multi-modal driving strategies. We further introduce a fine-grained feature distillation mechanism to guide a better interaction and fusion between the high-level semantic planning space and the vectorized scene context. Notably, LAP can produce high-quality plans in one single denoising step, substantially reducing computational overhead. Through extensive evaluations on the large-scale nuPlan benchmark, LAP achieves state-of-the-art closed-loop performance among learning-based planning methods, while demonstrating an inference speed-up of at most 10 times over previous SOTA approaches.

Problem

Research questions and friction points this paper is trying to address.

Reduces latency in diffusion-based autonomous driving planners

Disentangles high-level intents from low-level kinematics for planning

Improves interaction between semantic planning and scene context

Innovation

Methods, ideas, or system contributions that make the work stand out.

Plans in VAE-learned latent space for intent disentanglement

Uses fine-grained feature distillation for semantic-context fusion

Produces high-quality plans in single denoising step

🔎 Similar Papers

No similar papers found.