Towards Viewpoint-Robust End-to-End Autonomous Driving with 3D Foundation Model Priors

📅 2026-04-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limited robustness of end-to-end autonomous driving systems under camera viewpoint variations by proposing a novel data-augmentation-free approach that, for the first time, integrates geometric priors from 3D foundation models into an end-to-end architecture. The method generates pixel-wise 3D coordinates via depth estimation to serve as positional embeddings and employs cross-attention mechanisms to fuse intermediate geometric features, thereby substantially enhancing the model’s adaptability to viewpoint perturbations. Evaluated on the VR-Drive benchmark, the proposed approach significantly mitigates performance degradation under diverse viewpoint shifts—particularly those involving pitch angle and height variations—demonstrating strong effectiveness and generalization capability.
📝 Abstract
Robust trajectory planning under camera viewpoint changes is important for scalable end-to-end autonomous driving. However, existing models often depend heavily on the camera viewpoints seen during training. We investigate an augmentation-free approach that leverages geometric priors from a 3D foundation model. The method injects per-pixel 3D positions derived from depth estimates as positional embeddings and fuses intermediate geometric features through cross-attention. Experiments on the VR-Drive camera viewpoint perturbation benchmark show reduced performance degradation under most perturbation conditions, with clear improvements under pitch and height perturbations. Gains under longitudinal translation are smaller, suggesting that more viewpoint-agnostic integration is needed for robustness to camera viewpoint changes.
Problem

Research questions and friction points this paper is trying to address.

viewpoint robustness
autonomous driving
camera viewpoint changes
trajectory planning
3D foundation model
Innovation

Methods, ideas, or system contributions that make the work stand out.

3D foundation model
viewpoint robustness
positional embedding
cross-attention
end-to-end autonomous driving
🔎 Similar Papers
No similar papers found.
H
Hiroki Hashimoto
Chiba University
H
Hiromichi Goto
SUZUCA.AI
H
Hiroyuki Sugai
SUZUCA.AI
Hiroshi Kera
Hiroshi Kera
Chiba University
Approximate Computer AlgebraAdversarial Machine LearningMath Transformer
K
Kazuhiko Kawamoto
Chiba University