SA-Occ: Satellite-Assisted 3D Occupancy Prediction in Real World

📅 2025-03-20

📈 Citations: 0

✨ Influential: 0

career value

193K/year

🤖 AI Summary

Existing monocular street-view 3D occupancy prediction methods suffer from occlusion and weak long-range perception. This paper introduces historical satellite imagery into real-time autonomous driving perception for the first time, proposing a multi-view fusion framework to address cross-view spatiotemporal asynchrony and the 2D→3D geometric modeling challenge. Key contributions include: (1) GPS/IMU-driven spatiotemporal registration; (2) a dynamic decoupled fusion mechanism that separates features from occluded and non-occluded regions; and (3) 3D-projection-guided satellite feature enhancement coupled with dual-view voxel uniform sampling alignment. Evaluated on Occ3D-nuScenes, our method achieves a single-frame mIoU of 39.05%, surpassing the state-of-the-art by 6.97 percentage points, with only 6.93 ms additional latency. It significantly improves robustness in long-range and occluded scenarios.

Technology Category

Application Category

📝 Abstract

Existing vision-based 3D occupancy prediction methods are inherently limited in accuracy due to their exclusive reliance on street-view imagery, neglecting the potential benefits of incorporating satellite views. We propose SA-Occ, the first Satellite-Assisted 3D occupancy prediction model, which leverages GPS&IMU to integrate historical yet readily available satellite imagery into real-time applications, effectively mitigating limitations of ego-vehicle perceptions, involving occlusions and degraded performance in distant regions. To address the core challenges of cross-view perception, we propose: 1) Dynamic-Decoupling Fusion, which resolves inconsistencies in dynamic regions caused by the temporal asynchrony between satellite and street views; 2) 3D-Proj Guidance, a module that enhances 3D feature extraction from inherently 2D satellite imagery; and 3) Uniform Sampling Alignment, which aligns the sampling density between street and satellite views. Evaluated on Occ3D-nuScenes, SA-Occ achieves state-of-the-art performance, especially among single-frame methods, with a 39.05% mIoU (a 6.97% improvement), while incurring only 6.93 ms of additional latency per frame. Our code and newly curated dataset are available at https://github.com/chenchen235/SA-Occ.

Problem

Research questions and friction points this paper is trying to address.

Improves 3D occupancy prediction accuracy using satellite imagery.

Addresses limitations of street-view-only methods like occlusions.

Enhances cross-view perception with dynamic fusion and alignment techniques.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates satellite imagery with GPS & IMU

Dynamic-Decoupling Fusion for temporal asynchrony

3D-Proj Guidance enhances 2D to 3D feature extraction

🔎 Similar Papers

OccRWKV: Rethinking Efficient 3D Semantic Occupancy Prediction with Linear Complexity