HEAT: Heterogeneous End-to-End Autonomous Driving via Trajectory-Guided World Models

📅 2026-05-19

📈 Citations: 0

✨ Influential: 0

career value

233K/year

🤖 AI Summary

This work addresses the performance degradation of end-to-end autonomous driving models when jointly trained across multiple heterogeneous domains—such as diverse cities, sensor configurations, and traffic patterns—due to inter-domain conflicts. To mitigate this, the authors propose a trajectory-centric learning paradigm that, for the first time, integrates trajectory-guided supervision with an action-conditioned world model. This approach learns domain-invariant representations of driving intent directly from raw sensor inputs, effectively disentangling domain-specific nuisances from the underlying driving policy. Consequently, a single unified model achieves high performance across domains without requiring domain-specific fine-tuning. Extensive experiments on established end-to-end driving benchmarks—including nuScenes, NAVSIM, and Waymo—demonstrate that the proposed framework significantly outperforms existing methods, confirming the feasibility of maintaining consistently strong performance across heterogeneous environments with a single model.

📝 Abstract

End-to-end autonomous driving has emerged as a compelling alternative to traditional modular pipelines by directly mapping raw sensor data to driving actions. While recent approaches achieve strong performance on single-domain datasets, their performance degrades significantly when trained jointly across multiple heterogeneous domains. In practice, however, autonomous systems must operate across diverse environments with heterogeneous distributions, including different cities, sensor configurations, and traffic patterns, without domain-specific retraining. This gap highlights a key challenge in multi-domain learning: domain-specific variations across heterogeneous domains introduce conflicting learning signals, driving models toward compromised solutions that are suboptimal across domains. To address this, we propose a trajectory-driven learning paradigm that organizes training around planning trajectories, enabling the model to capture domain-invariant representations of driving intent. Furthermore, we incorporate a world model that predicts future latent features conditioned on ego actions, improving feature consistency and mitigating domain-induced biases. We evaluate our approach on three benchmarks, nuScenes, NAVSIM, and the Waymo end-to-end dataset, and show substantial improvements over existing methods across all domains. Our results demonstrate that a single unified model can be trained on heterogeneous datasets while maintaining strong performance within each domain, highlighting a step toward scalable real-world deployment. We will make our code publicly available.

Problem

Research questions and friction points this paper is trying to address.

end-to-end autonomous driving

multi-domain learning

heterogeneous domains

domain generalization

trajectory prediction

Innovation

Methods, ideas, or system contributions that make the work stand out.

trajectory-guided learning

world model

heterogeneous domains