SA-Occ: Satellite-Assisted 3D Occupancy Prediction in Real World

📅 2025-03-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing monocular street-view 3D occupancy prediction methods suffer from occlusion and weak long-range perception. This paper introduces historical satellite imagery into real-time autonomous driving perception for the first time, proposing a multi-view fusion framework to address cross-view spatiotemporal asynchrony and the 2D→3D geometric modeling challenge. Key contributions include: (1) GPS/IMU-driven spatiotemporal registration; (2) a dynamic decoupled fusion mechanism that separates features from occluded and non-occluded regions; and (3) 3D-projection-guided satellite feature enhancement coupled with dual-view voxel uniform sampling alignment. Evaluated on Occ3D-nuScenes, our method achieves a single-frame mIoU of 39.05%, surpassing the state-of-the-art by 6.97 percentage points, with only 6.93 ms additional latency. It significantly improves robustness in long-range and occluded scenarios.

Technology Category

Application Category

📝 Abstract
Existing vision-based 3D occupancy prediction methods are inherently limited in accuracy due to their exclusive reliance on street-view imagery, neglecting the potential benefits of incorporating satellite views. We propose SA-Occ, the first Satellite-Assisted 3D occupancy prediction model, which leverages GPS&IMU to integrate historical yet readily available satellite imagery into real-time applications, effectively mitigating limitations of ego-vehicle perceptions, involving occlusions and degraded performance in distant regions. To address the core challenges of cross-view perception, we propose: 1) Dynamic-Decoupling Fusion, which resolves inconsistencies in dynamic regions caused by the temporal asynchrony between satellite and street views; 2) 3D-Proj Guidance, a module that enhances 3D feature extraction from inherently 2D satellite imagery; and 3) Uniform Sampling Alignment, which aligns the sampling density between street and satellite views. Evaluated on Occ3D-nuScenes, SA-Occ achieves state-of-the-art performance, especially among single-frame methods, with a 39.05% mIoU (a 6.97% improvement), while incurring only 6.93 ms of additional latency per frame. Our code and newly curated dataset are available at https://github.com/chenchen235/SA-Occ.
Problem

Research questions and friction points this paper is trying to address.

Improves 3D occupancy prediction accuracy using satellite imagery.
Addresses limitations of street-view-only methods like occlusions.
Enhances cross-view perception with dynamic fusion and alignment techniques.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates satellite imagery with GPS & IMU
Dynamic-Decoupling Fusion for temporal asynchrony
3D-Proj Guidance enhances 2D to 3D feature extraction
C
Chen Chen
Key Laboratory of Target Cognition and Application Technology, Aerospace Information Research Institute, Chinese Academy of Sciences; University of Chinese Academy of Sciences
Zhirui Wang
Zhirui Wang
Aerospace Information Research Institute, Chinese Academy of Sciences
Remote sensing image interpretationtarget detectiontarget recognition
T
Taowei Sheng
Key Laboratory of Target Cognition and Application Technology, Aerospace Information Research Institute, Chinese Academy of Sciences; University of Chinese Academy of Sciences
Y
Yi Jiang
Key Laboratory of Target Cognition and Application Technology, Aerospace Information Research Institute, Chinese Academy of Sciences
Y
Yundu Li
Key Laboratory of Target Cognition and Application Technology, Aerospace Information Research Institute, Chinese Academy of Sciences
P
Peirui Cheng
Key Laboratory of Target Cognition and Application Technology, Aerospace Information Research Institute, Chinese Academy of Sciences
Luning Zhang
Luning Zhang
Shanghai University MS student
Multi-modal reasoning
Kaiqiang Chen
Kaiqiang Chen
Chinese Academy of Sciences
Semantic segmentationConvolutional Neural Networks
Y
Yanfeng Hu
Key Laboratory of Target Cognition and Application Technology, Aerospace Information Research Institute, Chinese Academy of Sciences
X
Xue Yang
Shanghai Jiao Tong University
Xian Sun
Xian Sun
Aerospace Information Research Institute, Chinese Academy of Sciences
Remote SensingComputer Vision and Pattern RecognitionArtificial Intelligence