🤖 AI Summary
End-to-end autonomous driving suffers from poor cross-city generalization, high domain adaptation cost, and difficulty balancing inference efficiency with performance stability. To address this, we propose a cross-domain generalization framework based on joint probabilistic modeling. Our approach innovatively employs Gaussian processes (GPs) to learn a scene-agnostic primitive trajectory token set, enabling zero-shot, zero-overhead domain adaptation without fine-tuning or added inference latency. We represent driving scenes as composable trajectory tokens and jointly model their spatiotemporal distributions and inter-domain uncertainties via a unified probabilistic formulation. Evaluated on real-world road data across multiple cities, our method significantly outperforms full-parameter fine-tuning baselines—achieving an average 12.7% improvement in driving success rate—while preserving the original model’s computational complexity and inference latency. To the best of our knowledge, this is the first method enabling robust, retraining-free cross-domain deployment with no additional inference cost.
📝 Abstract
End-to-end (E2E) autonomous driving has recently emerged as a new paradigm, offering significant potential. However, few studies have looked into the practical challenge of deployment across domains (e.g., cities). Although several works have incorporated Large Language Models (LLMs) to leverage their open-world knowledge, LLMs do not guarantee cross-domain driving performance and may incur prohibitive retraining costs during domain adaptation. In this paper, we propose RoCA, a novel framework for robust cross-domain E2E autonomous driving. RoCA formulates the joint probabilistic distribution over the tokens that encode ego and surrounding vehicle information in the E2E pipeline. Instantiating with a Gaussian process (GP), RoCA learns a set of basis tokens with corresponding trajectories, which span diverse driving scenarios. Then, given any driving scene, it is able to probabilistically infer the future trajectory. By using RoCA together with a base E2E model in source-domain training, we improve the generalizability of the base model, without requiring extra inference computation. In addition, RoCA enables robust adaptation on new target domains, significantly outperforming direct finetuning. We extensively evaluate RoCA on various cross-domain scenarios and show that it achieves strong domain generalization and adaptation performance.