🤖 AI Summary
To address poor generalization in 3D panoramic occupancy prediction—caused by insufficient geometric accuracy and ambiguous instance boundaries—this paper proposes an end-to-end differentiable framework. Our method introduces a novel hybrid view transformation branch that jointly leverages 3D Gaussian depth representation and discrete depth bins for improved geometric modeling. Additionally, we incorporate a BEV edge prior with edge-aware supervision to significantly enhance boundary localization and instance separation. The framework unifies voxel-level semantic and instance identity prediction within a single architecture. Evaluated on Occ3D-nuScenes, our approach achieves new state-of-the-art performance: geometric reasoning error is reduced by 12.6%, while panoramic segmentation metrics—PQ, SQ, and RQ—improve by 3.8, 2.4, and 4.1 percentage points, respectively. These results demonstrate superior geometric consistency and instance robustness in complex urban driving scenes.
📝 Abstract
3D Panoptic Occupancy Prediction aims to reconstruct a dense volumetric scene map by predicting the semantic class and instance identity of every occupied region in 3D space. Achieving such fine-grained 3D understanding requires precise geometric reasoning and spatially consistent scene representation across complex environments. However, existing approaches often struggle to maintain precise geometry and capture the precise spatial range of 3D instances critical for robust panoptic separation. To overcome these limitations, we introduce HyGE-Occ, a novel framework that leverages a hybrid view-transformation branch with 3D Gaussian and edge priors to enhance both geometric consistency and boundary awareness in 3D panoptic occupancy prediction. HyGE-Occ employs a hybrid view-transformation branch that fuses a continuous Gaussian-based depth representation with a discretized depth-bin formulation, producing BEV features with improved geometric consistency and structural coherence. In parallel, we extract edge maps from BEV features and use them as auxiliary information to learn edge cues. In our extensive experiments on the Occ3D-nuScenes dataset, HyGE-Occ outperforms existing work, demonstrating superior 3D geometric reasoning.