🤖 AI Summary
Existing monocular street-view 3D occupancy prediction methods suffer from occlusion and weak long-range perception. This paper introduces historical satellite imagery into real-time autonomous driving perception for the first time, proposing a multi-view fusion framework to address cross-view spatiotemporal asynchrony and the 2D→3D geometric modeling challenge. Key contributions include: (1) GPS/IMU-driven spatiotemporal registration; (2) a dynamic decoupled fusion mechanism that separates features from occluded and non-occluded regions; and (3) 3D-projection-guided satellite feature enhancement coupled with dual-view voxel uniform sampling alignment. Evaluated on Occ3D-nuScenes, our method achieves a single-frame mIoU of 39.05%, surpassing the state-of-the-art by 6.97 percentage points, with only 6.93 ms additional latency. It significantly improves robustness in long-range and occluded scenarios.
📝 Abstract
Existing vision-based 3D occupancy prediction methods are inherently limited in accuracy due to their exclusive reliance on street-view imagery, neglecting the potential benefits of incorporating satellite views. We propose SA-Occ, the first Satellite-Assisted 3D occupancy prediction model, which leverages GPS&IMU to integrate historical yet readily available satellite imagery into real-time applications, effectively mitigating limitations of ego-vehicle perceptions, involving occlusions and degraded performance in distant regions. To address the core challenges of cross-view perception, we propose: 1) Dynamic-Decoupling Fusion, which resolves inconsistencies in dynamic regions caused by the temporal asynchrony between satellite and street views; 2) 3D-Proj Guidance, a module that enhances 3D feature extraction from inherently 2D satellite imagery; and 3) Uniform Sampling Alignment, which aligns the sampling density between street and satellite views. Evaluated on Occ3D-nuScenes, SA-Occ achieves state-of-the-art performance, especially among single-frame methods, with a 39.05% mIoU (a 6.97% improvement), while incurring only 6.93 ms of additional latency per frame. Our code and newly curated dataset are available at https://github.com/chenchen235/SA-Occ.