VR-Drive: Viewpoint-Robust End-to-End Driving with Feed-Forward 3D Gaussian Splatting

πŸ“… 2025-10-27
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
To address the degradation of end-to-end autonomous driving robustness caused by inter-camera viewpoint discrepancies, this paper proposes VR-Driveβ€”a multi-view robust end-to-end framework. Methodologically, it introduces feedforward 3D Gaussian splatting for unsupervised novel-view synthesis, jointly optimizing 3D scene reconstruction and trajectory planning; it further designs a cross-view hybrid memory bank and a consistency distillation mechanism to enable online augmentation under sparse-view conditions and cross-view temporal modeling. Experiments on a custom-built multi-view benchmark demonstrate that VR-Drive significantly suppresses synthesis artifacts, markedly improving planning generalization and stability under unseen viewpoints. The framework establishes a novel paradigm for scalable, real-world deployment of end-to-end autonomous driving systems.

Technology Category

Application Category

πŸ“ Abstract
End-to-end autonomous driving (E2E-AD) has emerged as a promising paradigm that unifies perception, prediction, and planning into a holistic, data-driven framework. However, achieving robustness to varying camera viewpoints, a common real-world challenge due to diverse vehicle configurations, remains an open problem. In this work, we propose VR-Drive, a novel E2E-AD framework that addresses viewpoint generalization by jointly learning 3D scene reconstruction as an auxiliary task to enable planning-aware view synthesis. Unlike prior scene-specific synthesis approaches, VR-Drive adopts a feed-forward inference strategy that supports online training-time augmentation from sparse views without additional annotations. To further improve viewpoint consistency, we introduce a viewpoint-mixed memory bank that facilitates temporal interaction across multiple viewpoints and a viewpoint-consistent distillation strategy that transfers knowledge from original to synthesized views. Trained in a fully end-to-end manner, VR-Drive effectively mitigates synthesis-induced noise and improves planning under viewpoint shifts. In addition, we release a new benchmark dataset to evaluate E2E-AD performance under novel camera viewpoints, enabling comprehensive analysis. Our results demonstrate that VR-Drive is a scalable and robust solution for the real-world deployment of end-to-end autonomous driving systems.
Problem

Research questions and friction points this paper is trying to address.

Addressing viewpoint robustness in end-to-end autonomous driving
Learning 3D scene reconstruction for planning-aware view synthesis
Improving planning under viewpoint shifts with novel strategies
Innovation

Methods, ideas, or system contributions that make the work stand out.

Joint learning 3D scene reconstruction for view synthesis
Feed-forward inference with sparse view augmentation
Viewpoint-mixed memory bank and distillation strategy
πŸ”Ž Similar Papers
No similar papers found.