HE-Drive: Human-Like End-to-End Driving with Vision Language Models

📅 2024-10-07
🏛️ arXiv.org
📈 Citations: 17
Influential: 0
📄 PDF
🤖 AI Summary
Existing imitation learning methods produce autonomous driving trajectories exhibiting temporal inconsistency and passenger discomfort. This paper proposes the first human-centric end-to-end system: it employs sparse 3D perception to extract lightweight spatial representations; introduces a conditional denoising diffusion probabilistic model (DDPM) to generate temporally consistent, multimodally plausible trajectories; and pioneers the integration of vision-language models (VLMs) into trajectory scoring for comfort-driven decision-making. By synergizing sparse perception with conditional DDPM, the method enhances trajectory plausibility and temporal coherence while maintaining efficient inference. Evaluated on nuScenes and OpenScene, it reduces collision rate by 71% compared to VAD and achieves 1.9× faster inference than SparseDrive. Furthermore, real-world road tests confirm superior ride comfort.

Technology Category

Application Category

📝 Abstract
In this paper, we propose HE-Drive: the first human-like-centric end-to-end autonomous driving system to generate trajectories that are both temporally consistent and comfortable. Recent studies have shown that imitation learning-based planners and learning-based trajectory scorers can effectively generate and select accuracy trajectories that closely mimic expert demonstrations. However, such trajectory planners and scorers face the dilemma of generating temporally inconsistent and uncomfortable trajectories. To solve the above problems, Our HE-Drive first extracts key 3D spatial representations through sparse perception, which then serves as conditional inputs for a Conditional Denoising Diffusion Probabilistic Models (DDPMs)-based motion planner to generate temporal consistency multi-modal trajectories. A Vision-Language Models (VLMs)-guided trajectory scorer subsequently selects the most comfortable trajectory from these candidates to control the vehicle, ensuring human-like end-to-end driving. Experiments show that HE-Drive not only achieves state-of-the-art performance (i.e., reduces the average collision rate by 71% than VAD) and efficiency (i.e., 1.9X faster than SparseDrive) on the challenging nuScenes and OpenScene datasets but also provides the most comfortable driving experience on real-world data.For more information, visit the project website: https://jmwang0117.github.io/HE-Drive/.
Problem

Research questions and friction points this paper is trying to address.

Generating temporally consistent autonomous driving trajectories
Improving passenger comfort in motion planning systems
Addressing discomfort from inconsistent trajectory generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses sparse perception for 3D spatial representations
Employs conditional DDPM for multi-modal trajectory generation
Applies dual-stream scorer to select comfortable trajectories
🔎 Similar Papers
No similar papers found.