3D Occupancy Prediction with Low-Resolution Queries via Prototype-aware View Transformation

๐Ÿ“… 2025-03-19
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
In real-time camera-based 3D occupancy prediction, low-resolution voxel queries severely degrade geometric and semantic fidelity. To address this, we propose a prototype-aware view transformation framework. Our method explicitly models high-order visual structures via a novel clustering-based image-segment prototype mechanism for view transformation. Additionally, we design a multi-view feature disentanglement module and a lightweight voxel decoder, enabling high-fidelity 3D occupancy reconstruction even at extremely low resolutionโ€”only 25% of the original voxel count. Evaluated on Occ3D and SemanticKITTI, our approach achieves performance competitive with full-resolution baselines despite operating at just 25% resolution (i.e., a 75% reduction), significantly outperforming existing real-time methods. The key contributions include: (1) the first prototype-driven view transformation leveraging clustered image-segment representations; and (2) an efficient, resolution-robust architecture that preserves structural and semantic integrity under severe voxel downsampling.

Technology Category

Application Category

๐Ÿ“ Abstract
The resolution of voxel queries significantly influences the quality of view transformation in camera-based 3D occupancy prediction. However, computational constraints and the practical necessity for real-time deployment require smaller query resolutions, which inevitably leads to an information loss. Therefore, it is essential to encode and preserve rich visual details within limited query sizes while ensuring a comprehensive representation of 3D occupancy. To this end, we introduce ProtoOcc, a novel occupancy network that leverages prototypes of clustered image segments in view transformation to enhance low-resolution context. In particular, the mapping of 2D prototypes onto 3D voxel queries encodes high-level visual geometries and complements the loss of spatial information from reduced query resolutions. Additionally, we design a multi-perspective decoding strategy to efficiently disentangle the densely compressed visual cues into a high-dimensional 3D occupancy scene. Experimental results on both Occ3D and SemanticKITTI benchmarks demonstrate the effectiveness of the proposed method, showing clear improvements over the baselines. More importantly, ProtoOcc achieves competitive performance against the baselines even with 75% reduced voxel resolution.
Problem

Research questions and friction points this paper is trying to address.

Enhance 3D occupancy prediction with low-resolution voxel queries.
Preserve visual details in limited query sizes for real-time deployment.
Improve spatial information encoding using 2D prototypes in 3D queries.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Prototype-aware view transformation enhances low-resolution queries
Multi-perspective decoding disentangles compressed visual cues
ProtoOcc achieves competitive performance with reduced voxel resolution
๐Ÿ”Ž Similar Papers
No similar papers found.
G
Gyeongrok Oh
Korea University
Sungjune Kim
Sungjune Kim
Korea University
H
Heeju Ko
Korea University
H
Hyung-Gun Chi
Purdue University
J
Jinkyu Kim
Korea University
D
Dongwook Lee
AI Center, DS Division, Samsung Electronics
D
Daehyun Ji
AI Center, DS Division, Samsung Electronics
Sungjoon Choi
Sungjoon Choi
Korea University
Robotics
Sujin Jang
Sujin Jang
Principal Researcher, Samsung AI Center (DS Division)
Machine LearningRoboticsComputer VisionHuman-Computer Interaction
Sangpil Kim
Sangpil Kim
Korea University
Computer Vision