LAP: Fast LAtent Diffusion Planner with Fine-Grained Feature Distillation for Autonomous Driving

πŸ“… 2025-11-29
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Diffusion models for autonomous driving planning suffer from high latency due to iterative sampling and weak high-level semantic representation when modeling trajectories directly in the raw spaceβ€”often collapsing into low-level kinematic patterns. To address this, we propose a latent-space single-step denoising planning framework. First, a disentangled VAE constructs a low-dimensional planning latent space that explicitly separates semantic intent from motion dynamics. Second, a single-step diffusion denoiser operates within this latent space, eliminating the need for iterative sampling. Third, a fine-grained scene feature distillation mechanism is introduced to explicitly align high-level planning decisions with contextual semantic cues. Evaluated in closed-loop nuPlan benchmarks, our method achieves state-of-the-art performance among learning-based planners. It accelerates inference by up to 10Γ— over prior diffusion-based approaches, while preserving multimodality, planning efficiency, and semantic consistency.

Technology Category

Application Category

πŸ“ Abstract
Diffusion models have demonstrated strong capabilities for modeling human-like driving behaviors in autonomous driving, but their iterative sampling process induces substantial latency, and operating directly on raw trajectory points forces the model to spend capacity on low-level kinematics, rather than high-level multi-modal semantics. To address these limitations, we propose LAtent Planner (LAP), a framework that plans in a VAE-learned latent space that disentangles high-level intents from low-level kinematics, enabling our planner to capture rich, multi-modal driving strategies. We further introduce a fine-grained feature distillation mechanism to guide a better interaction and fusion between the high-level semantic planning space and the vectorized scene context. Notably, LAP can produce high-quality plans in one single denoising step, substantially reducing computational overhead. Through extensive evaluations on the large-scale nuPlan benchmark, LAP achieves state-of-the-art closed-loop performance among learning-based planning methods, while demonstrating an inference speed-up of at most 10 times over previous SOTA approaches.
Problem

Research questions and friction points this paper is trying to address.

Reduces latency in diffusion-based autonomous driving planners
Disentangles high-level intents from low-level kinematics for planning
Improves interaction between semantic planning and scene context
Innovation

Methods, ideas, or system contributions that make the work stand out.

Plans in VAE-learned latent space for intent disentanglement
Uses fine-grained feature distillation for semantic-context fusion
Produces high-quality plans in single denoising step
πŸ”Ž Similar Papers
No similar papers found.
Jinhao Zhang
Jinhao Zhang
Harbin Institute of Technology, Shenzhen
Autonomous DrivingEmbodied AIGenerative Model
W
Wenlong Xia
Harbin Institute of Technology (Shenzhen)
Zhexuan Zhou
Zhexuan Zhou
Harbin Institute of Technology,Shenzhen
Robotics
Y
Youmin Gong
Harbin Institute of Technology (Shenzhen)
J
Jie Mei
Harbin Institute of Technology (Shenzhen)