$M^2$-Occ: Resilient 3D Semantic Occupancy Prediction for Autonomous Driving with Incomplete Camera Inputs

📅 2026-03-10

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the performance degradation in 3D semantic occupancy prediction for autonomous driving caused by partial multi-view camera failures, such as occlusions or sensor malfunctions. To this end, the authors propose the $M^2$-Occ framework, which integrates a Multi-view Masked Reconstruction (MMR) module with a learnable category-level Feature Memory Module (FMM). This design enables reconstruction of missing views in feature space while incorporating global semantic priors, thereby preserving both 3D geometric structure and semantic consistency. As the first method specifically designed for robust semantic occupancy prediction under missing views, $M^2$-Occ achieves significant performance gains on the SurroundOcc benchmark of nuScenes under extreme missing-view conditions—improving IoU by 4.93% with the rear view missing and by 5.01% when five out of six views are absent—while maintaining competitive performance with complete input.

Technology Category

Application Category

📝 Abstract

Semantic occupancy prediction enables dense 3D geometric and semantic understanding for autonomous driving. However, existing camera-based approaches implicitly assume complete surround-view observations, an assumption that rarely holds in real-world deployment due to occlusion, hardware malfunction, or communication failures. We study semantic occupancy prediction under incomplete multi-camera inputs and introduce $M^2$-Occ, a framework designed to preserve geometric structure and semantic coherence when views are missing. $M^2$-Occ addresses two complementary challenges. First, a Multi-view Masked Reconstruction (MMR) module leverages the spatial overlap among neighboring cameras to recover missing-view representations directly in the feature space. Second, a Feature Memory Module (FMM) introduces a learnable memory bank that stores class-level semantic prototypes. By retrieving and integrating these global priors, the FMM refines ambiguous voxel features, ensuring semantic consistency even when observational evidence is incomplete. We introduce a systematic missing-view evaluation protocol on the nuScenes-based SurroundOcc benchmark, encompassing both deterministic single-view failures and stochastic multi-view dropout scenarios. Under the safety-critical missing back-view setting, $M^2$-Occ improves the IoU by 4.93%. As the number of missing cameras increases, the robustness gap further widens; for instance, under the setting with five missing views, our method boosts the IoU by 5.01%. These gains are achieved without compromising full-view performance. The source code will be publicly released at https://github.com/qixi7up/M2-Occ.

Problem

Research questions and friction points this paper is trying to address.

semantic occupancy prediction

incomplete camera inputs

autonomous driving

missing views

3D scene understanding

Innovation

Methods, ideas, or system contributions that make the work stand out.

Semantic Occupancy Prediction

Incomplete Camera Inputs

Multi-view Masked Reconstruction

Feature Memory Module

Autonomous Driving

🔎 Similar Papers

No similar papers found.

Authors to Follow