Two Tasks, One Goal: Uniting Motion and Planning for Excellent End To End Autonomous Driving Performance

📅 2025-04-17

📈 Citations: 0

✨ Influential: 0

career value

208K/year

🤖 AI Summary

In end-to-end autonomous driving, decoupling planning from motion control limits generalization and hinders exploitation of out-of-distribution data inherent in motion tasks. To address this, we propose TTOG, a two-stage trajectory generation framework that enables joint optimization of planning and motion control for the first time. Our approach introduces: (1) an equivariant context-sharing scene adapter (ECSA) to enhance cross-agent representation generalization; (2) an ego-vehicle-driven surrounding vehicle state estimation mechanism to mitigate observability limitations in multi-agent scenarios; and (3) a multi-agent collaborative representation learning paradigm. Evaluated on nuScenes open-loop benchmark, TTOG reduces L2 trajectory error by 36.06%; in closed-loop Bench2Drive evaluation, it achieves a 22% improvement in driving score—setting a new state-of-the-art.

Technology Category

Application Category

📝 Abstract

End-to-end autonomous driving has made impressive progress in recent years. Former end-to-end autonomous driving approaches often decouple planning and motion tasks, treating them as separate modules. This separation overlooks the potential benefits that planning can gain from learning out-of-distribution data encountered in motion tasks. However, unifying these tasks poses significant challenges, such as constructing shared contextual representations and handling the unobservability of other vehicles' states. To address these challenges, we propose TTOG, a novel two-stage trajectory generation framework. In the first stage, a diverse set of trajectory candidates is generated, while the second stage focuses on refining these candidates through vehicle state information. To mitigate the issue of unavailable surrounding vehicle states, TTOG employs a self-vehicle data-trained state estimator, subsequently extended to other vehicles. Furthermore, we introduce ECSA (equivariant context-sharing scene adapter) to enhance the generalization of scene representations across different agents. Experimental results demonstrate that TTOG achieves state-of-the-art performance across both planning and motion tasks. Notably, on the challenging open-loop nuScenes dataset, TTOG reduces the L2 distance by 36.06%. Furthermore, on the closed-loop Bench2Drive dataset, our approach achieves a 22% improvement in the driving score (DS), significantly outperforming existing baselines.

Problem

Research questions and friction points this paper is trying to address.

Uniting motion and planning tasks for autonomous driving

Handling unobservability of other vehicles' states

Improving generalization of scene representations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-stage trajectory generation framework TTOG

Self-vehicle data-trained state estimator

Equivariant context-sharing scene adapter ECSA

🔎 Similar Papers

Real-time Motion Planning for autonomous vehicles in dynamic environments