LEAP: Enhancing Vision-Based Occupancy Networks with Lightweight Spatio-Temporal Correlation

📅 2025-02-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the limited accuracy of visual occupancy networks under occlusion and sparse observations, this paper proposes a lightweight spatiotemporal correlation modeling method. The core innovation introduces a compact implicit-space tokenization mechanism and a tri-stream fusion architecture integrating pixel-, voxel-, and temporal-level features. Within a shared latent space, motion-aware feature alignment and cross-attention enable plug-and-play spatiotemporal association modeling. The method incurs negligible computational overhead (<0.5% parameter increase) and supports seamless integration with any mainstream occupancy network. Evaluated on the nuScenes benchmark, it achieves a +3.2 percentage point improvement in mean Intersection-over-Union (mIoU), maintains real-time inference at >15 FPS, and significantly enhances both robustness and accuracy—effectively balancing performance and latency.

Technology Category

Application Category

📝 Abstract
Vision-based occupancy networks provide an end-to-end solution for reconstructing the surrounding environment using semantic occupied voxels derived from multi-view images. This technique relies on effectively learning the correlation between pixel-level visual information and voxels. Despite recent advancements, occupancy results still suffer from limited accuracy due to occlusions and sparse visual cues. To address this, we propose a Lightweight Spatio-Temporal Correlation (LEAP)} method, which significantly enhances the performance of existing occupancy networks with minimal computational overhead. LEAP can be seamlessly integrated into various baseline networks, enabling a plug-and-play application. LEAP operates in three stages: 1) it tokenizes information from recent baseline and motion features into a shared, compact latent space; 2) it establishes full correlation through a tri-stream fusion architecture; 3) it generates occupancy results that strengthen the baseline's output. Extensive experiments demonstrate the efficiency and effectiveness of our method, outperforming the latest baseline models. The source code and several demos are available in the supplementary material.
Problem

Research questions and friction points this paper is trying to address.

Enhance vision-based occupancy network accuracy
Address occlusion and sparse visual cues
Integrate lightweight spatio-temporal correlation
Innovation

Methods, ideas, or system contributions that make the work stand out.

lightweight spatio-temporal correlation
tri-stream fusion architecture
plug-and-play integration method
🔎 Similar Papers
No similar papers found.
F
Fengcheng Yu
Sun Yat-sen University
H
Haoran Xu
Sun Yat-sen University
C
Canming Xia
Sun Yat-sen University
Guang Tan
Guang Tan
School of Intelligent Systems Engineering, Sun Yat-sen Unversity
Machine LearningMobile ComputingNetworking