🤖 AI Summary
Current autonomous driving systems rely solely on onboard sensors, rendering them vulnerable to limited field-of-view, occlusions, and adverse weather conditions, while lacking long-term memory of road topology. To address this, we propose a spatial retrieval-augmented paradigm that, for the first time, integrates offline-acquired georeferenced imagery (e.g., Google Maps snapshots) as a plug-and-play modality into a multi-task autonomous driving framework—requiring no additional hardware and enabling “memory-augmented” environmental perception. Our method aligns ego-vehicle trajectories with geographic images to fuse spatial priors, unifying support for object detection, HD mapping, occupancy prediction, end-to-end planning, and generative world modeling. Evaluated on nuScenes, it consistently improves performance across all tasks. We open-source our dataset, code, and benchmark to foster research on retrieval-augmented, spatially aware autonomous driving.
📝 Abstract
Existing autonomous driving systems rely on onboard sensors (cameras, LiDAR, IMU, etc) for environmental perception. However, this paradigm is limited by the drive-time perception horizon and often fails under limited view scope, occlusion or extreme conditions such as darkness and rain. In contrast, human drivers are able to recall road structure even under poor visibility. To endow models with this ``recall" ability, we propose the spatial retrieval paradigm, introducing offline retrieved geographic images as an additional input. These images are easy to obtain from offline caches (e.g, Google Maps or stored autonomous driving datasets) without requiring additional sensors, making it a plug-and-play extension for existing AD tasks.
For experiments, we first extend the nuScenes dataset with geographic images retrieved via Google Maps APIs and align the new data with ego-vehicle trajectories. We establish baselines across five core autonomous driving tasks: object detection, online mapping, occupancy prediction, end-to-end planning, and generative world modeling. Extensive experiments show that the extended modality could enhance the performance of certain tasks. We will open-source dataset curation code, data, and benchmarks for further study of this new autonomous driving paradigm.