π€ AI Summary
To address the degradation of end-to-end autonomous driving robustness caused by inter-camera viewpoint discrepancies, this paper proposes VR-Driveβa multi-view robust end-to-end framework. Methodologically, it introduces feedforward 3D Gaussian splatting for unsupervised novel-view synthesis, jointly optimizing 3D scene reconstruction and trajectory planning; it further designs a cross-view hybrid memory bank and a consistency distillation mechanism to enable online augmentation under sparse-view conditions and cross-view temporal modeling. Experiments on a custom-built multi-view benchmark demonstrate that VR-Drive significantly suppresses synthesis artifacts, markedly improving planning generalization and stability under unseen viewpoints. The framework establishes a novel paradigm for scalable, real-world deployment of end-to-end autonomous driving systems.
π Abstract
End-to-end autonomous driving (E2E-AD) has emerged as a promising paradigm that unifies perception, prediction, and planning into a holistic, data-driven framework. However, achieving robustness to varying camera viewpoints, a common real-world challenge due to diverse vehicle configurations, remains an open problem. In this work, we propose VR-Drive, a novel E2E-AD framework that addresses viewpoint generalization by jointly learning 3D scene reconstruction as an auxiliary task to enable planning-aware view synthesis. Unlike prior scene-specific synthesis approaches, VR-Drive adopts a feed-forward inference strategy that supports online training-time augmentation from sparse views without additional annotations. To further improve viewpoint consistency, we introduce a viewpoint-mixed memory bank that facilitates temporal interaction across multiple viewpoints and a viewpoint-consistent distillation strategy that transfers knowledge from original to synthesized views. Trained in a fully end-to-end manner, VR-Drive effectively mitigates synthesis-induced noise and improves planning under viewpoint shifts. In addition, we release a new benchmark dataset to evaluate E2E-AD performance under novel camera viewpoints, enabling comprehensive analysis. Our results demonstrate that VR-Drive is a scalable and robust solution for the real-world deployment of end-to-end autonomous driving systems.