3D Occupancy Prediction with Low-Resolution Queries via Prototype-aware View Transformation

📅 2025-03-19

📈 Citations: 0

✨ Influential: 0

career value

219K/year

🤖 AI Summary

In real-time camera-based 3D occupancy prediction, low-resolution voxel queries severely degrade geometric and semantic fidelity. To address this, we propose a prototype-aware view transformation framework. Our method explicitly models high-order visual structures via a novel clustering-based image-segment prototype mechanism for view transformation. Additionally, we design a multi-view feature disentanglement module and a lightweight voxel decoder, enabling high-fidelity 3D occupancy reconstruction even at extremely low resolution—only 25% of the original voxel count. Evaluated on Occ3D and SemanticKITTI, our approach achieves performance competitive with full-resolution baselines despite operating at just 25% resolution (i.e., a 75% reduction), significantly outperforming existing real-time methods. The key contributions include: (1) the first prototype-driven view transformation leveraging clustered image-segment representations; and (2) an efficient, resolution-robust architecture that preserves structural and semantic integrity under severe voxel downsampling.

Technology Category

Application Category

📝 Abstract

The resolution of voxel queries significantly influences the quality of view transformation in camera-based 3D occupancy prediction. However, computational constraints and the practical necessity for real-time deployment require smaller query resolutions, which inevitably leads to an information loss. Therefore, it is essential to encode and preserve rich visual details within limited query sizes while ensuring a comprehensive representation of 3D occupancy. To this end, we introduce ProtoOcc, a novel occupancy network that leverages prototypes of clustered image segments in view transformation to enhance low-resolution context. In particular, the mapping of 2D prototypes onto 3D voxel queries encodes high-level visual geometries and complements the loss of spatial information from reduced query resolutions. Additionally, we design a multi-perspective decoding strategy to efficiently disentangle the densely compressed visual cues into a high-dimensional 3D occupancy scene. Experimental results on both Occ3D and SemanticKITTI benchmarks demonstrate the effectiveness of the proposed method, showing clear improvements over the baselines. More importantly, ProtoOcc achieves competitive performance against the baselines even with 75% reduced voxel resolution.

Problem

Research questions and friction points this paper is trying to address.

Enhance 3D occupancy prediction with low-resolution voxel queries.

Preserve visual details in limited query sizes for real-time deployment.

Improve spatial information encoding using 2D prototypes in 3D queries.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Prototype-aware view transformation enhances low-resolution queries

Multi-perspective decoding disentangles compressed visual cues

ProtoOcc achieves competitive performance with reduced voxel resolution

🔎 Similar Papers

OccRWKV: Rethinking Efficient 3D Semantic Occupancy Prediction with Linear Complexity