Navigation-Guided Sparse Scene Representation for End-to-End Autonomous Driving

📅 2024-09-26
📈 Citations: 1
Influential: 0
📄 PDF

career value

239K/year
🤖 AI Summary
End-to-end autonomous driving (E2EAD) commonly relies on human-annotated explicit perception subtasks (e.g., detection, segmentation), incurring high annotation costs, poor generalization, and limited real-time deployability. To address this, we propose SSR—a supervision-free framework that replaces dense perception with an ultra-sparse, navigation-oriented scene representation comprising only 16 semantic tokens, explicitly modeling only driving-relevant semantic elements. Methodologically, SSR introduces a navigation-driven attention mechanism, a self-supervised temporal alignment module, and end-to-end differentiable training. On nuScenes, SSR reduces trajectory L2 error by 27.2% and collision rate by 51.6% compared to UniAD, while accelerating inference and training by 10.9× and 13×, respectively. In CARLA Town05 Long, it achieves a driving score 48.6 points higher than VAD-Base.

Technology Category

Application Category

📝 Abstract
End-to-End Autonomous Driving (E2EAD) methods typically rely on supervised perception tasks to extract explicit scene information (e.g., objects, maps). This reliance necessitates expensive annotations and constrains deployment and data scalability in real-time applications. In this paper, we introduce SSR, a novel framework that utilizes only 16 navigation-guided tokens as Sparse Scene Representation, efficiently extracting crucial scene information for E2EAD. Our method eliminates the need for human-designed supervised sub-tasks, allowing computational resources to concentrate on essential elements directly related to navigation intent. We further introduce a temporal enhancement module, aligning predicted future scenes with actual future scenes through self-supervision. SSR achieves a 27.2% relative reduction in L2 error and a 51.6% decrease in collision rate to UniAD in nuScenes, with a 10.9$ imes$ faster inference speed and 13$ imes$ faster training time. Moreover, SSR outperforms VAD-Base with a 48.6-point improvement on driving score in CARLA's Town05 Long benchmark. This framework represents a significant leap in real-time autonomous driving systems and paves the way for future scalable deployment. Code is available at https://github.com/PeidongLi/SSR.
Problem

Research questions and friction points this paper is trying to address.

Reduces reliance on expensive annotations for autonomous driving.
Eliminates need for human-designed supervised sub-tasks.
Improves real-time performance and scalability in autonomous driving.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses 16 navigation-guided tokens for scene representation.
Eliminates human-designed supervised sub-tasks.
Introduces temporal enhancement module for self-supervision.
🔎 Similar Papers
No similar papers found.