Spatial Retrieval Augmented Autonomous Driving

📅 2025-12-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current autonomous driving systems rely solely on onboard sensors, rendering them vulnerable to limited field-of-view, occlusions, and adverse weather conditions, while lacking long-term memory of road topology. To address this, we propose a spatial retrieval-augmented paradigm that, for the first time, integrates offline-acquired georeferenced imagery (e.g., Google Maps snapshots) as a plug-and-play modality into a multi-task autonomous driving framework—requiring no additional hardware and enabling “memory-augmented” environmental perception. Our method aligns ego-vehicle trajectories with geographic images to fuse spatial priors, unifying support for object detection, HD mapping, occupancy prediction, end-to-end planning, and generative world modeling. Evaluated on nuScenes, it consistently improves performance across all tasks. We open-source our dataset, code, and benchmark to foster research on retrieval-augmented, spatially aware autonomous driving.

Technology Category

Application Category

📝 Abstract
Existing autonomous driving systems rely on onboard sensors (cameras, LiDAR, IMU, etc) for environmental perception. However, this paradigm is limited by the drive-time perception horizon and often fails under limited view scope, occlusion or extreme conditions such as darkness and rain. In contrast, human drivers are able to recall road structure even under poor visibility. To endow models with this ``recall" ability, we propose the spatial retrieval paradigm, introducing offline retrieved geographic images as an additional input. These images are easy to obtain from offline caches (e.g, Google Maps or stored autonomous driving datasets) without requiring additional sensors, making it a plug-and-play extension for existing AD tasks. For experiments, we first extend the nuScenes dataset with geographic images retrieved via Google Maps APIs and align the new data with ego-vehicle trajectories. We establish baselines across five core autonomous driving tasks: object detection, online mapping, occupancy prediction, end-to-end planning, and generative world modeling. Extensive experiments show that the extended modality could enhance the performance of certain tasks. We will open-source dataset curation code, data, and benchmarks for further study of this new autonomous driving paradigm.
Problem

Research questions and friction points this paper is trying to address.

Enhancing autonomous driving perception with offline geographic images
Addressing sensor limitations in adverse visibility conditions
Extending autonomous systems' recall ability via spatial retrieval
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces spatial retrieval with offline geographic images
Extends nuScenes dataset using Google Maps APIs
Enhances autonomous driving tasks via plug-and-play modality
🔎 Similar Papers
No similar papers found.
Xiaosong Jia
Xiaosong Jia
Assistant Professor, Institute of Trustworthy Embodied AI (TEAI), Fudan University
Embodied AIAutonomous DrivingWorld ModelReinforcement Learning
C
Chenhe Zhang
Institute of Trustworthy Embodied AI, Fudan University
Y
Yule Jiang
Shanghai Jiao Tong University
S
Songbur Wong
Shanghai Jiao Tong University
Z
Zhiyuan Zhang
Shanghai Jiao Tong University
C
Chen Chen
Key Laboratory of Target Cognition and Application Technology, Aerospace Information Research Institute, Chinese Academy of Sciences
S
Shaofeng Zhang
University of Science and Technology of China
Xuanhe Zhou
Xuanhe Zhou
Assistant Professor, Shanghai Jiao Tong University
Data ManagementArtificial Intelligence
X
Xue Yang
Shanghai Jiao Tong University
Junchi Yan
Junchi Yan
FIAPR & ICML Board Member, SJTU (2018-), SII (2024-), AWS (2019-2022), IBM (2011-2018)
Computational IntelligenceAI4ScienceMachine LearningAutonomous Driving
Yu-Gang Jiang
Yu-Gang Jiang
Professor, Fudan University. IEEE & IAPR Fellow
Video AnalysisEmbodied AITrustworthy AI