Two Tasks, One Goal: Uniting Motion and Planning for Excellent End To End Autonomous Driving Performance

📅 2025-04-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In end-to-end autonomous driving, decoupling planning from motion control limits generalization and hinders exploitation of out-of-distribution data inherent in motion tasks. To address this, we propose TTOG, a two-stage trajectory generation framework that enables joint optimization of planning and motion control for the first time. Our approach introduces: (1) an equivariant context-sharing scene adapter (ECSA) to enhance cross-agent representation generalization; (2) an ego-vehicle-driven surrounding vehicle state estimation mechanism to mitigate observability limitations in multi-agent scenarios; and (3) a multi-agent collaborative representation learning paradigm. Evaluated on nuScenes open-loop benchmark, TTOG reduces L2 trajectory error by 36.06%; in closed-loop Bench2Drive evaluation, it achieves a 22% improvement in driving score—setting a new state-of-the-art.

Technology Category

Application Category

📝 Abstract
End-to-end autonomous driving has made impressive progress in recent years. Former end-to-end autonomous driving approaches often decouple planning and motion tasks, treating them as separate modules. This separation overlooks the potential benefits that planning can gain from learning out-of-distribution data encountered in motion tasks. However, unifying these tasks poses significant challenges, such as constructing shared contextual representations and handling the unobservability of other vehicles' states. To address these challenges, we propose TTOG, a novel two-stage trajectory generation framework. In the first stage, a diverse set of trajectory candidates is generated, while the second stage focuses on refining these candidates through vehicle state information. To mitigate the issue of unavailable surrounding vehicle states, TTOG employs a self-vehicle data-trained state estimator, subsequently extended to other vehicles. Furthermore, we introduce ECSA (equivariant context-sharing scene adapter) to enhance the generalization of scene representations across different agents. Experimental results demonstrate that TTOG achieves state-of-the-art performance across both planning and motion tasks. Notably, on the challenging open-loop nuScenes dataset, TTOG reduces the L2 distance by 36.06%. Furthermore, on the closed-loop Bench2Drive dataset, our approach achieves a 22% improvement in the driving score (DS), significantly outperforming existing baselines.
Problem

Research questions and friction points this paper is trying to address.

Uniting motion and planning tasks for autonomous driving
Handling unobservability of other vehicles' states
Improving generalization of scene representations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-stage trajectory generation framework TTOG
Self-vehicle data-trained state estimator
Equivariant context-sharing scene adapter ECSA
🔎 Similar Papers
L
Lin Liu
Beijing Key Laboratory of Traffic Data Mining and Embodied Intelligence, Beijing Jiaotong University
Ziying Song
Ziying Song
Beijing Jiaotong University
Object DetectionComputer VisionDeep Learning
Hongyu Pan
Hongyu Pan
Alibaba DAMO Academy, Autonomous Driving Lab
Computer VisionDetectionSegmentationPoint CloudMotion,End2End
L
Lei Yang
School of Vehicle and Mobility, Tsinghua University, China
C
Caiyan Jia
Beijing Key Laboratory of Traffic Data Mining and Embodied Intelligence, Beijing Jiaotong University