🤖 AI Summary
To address semantic ambiguity and geometric distortion caused by Gaussian overlap in 3D Gaussian representations, this paper proposes an embodied-perception-oriented occupancy prediction framework. Methodologically, we introduce the first opacity-guided self-encoder (OSE) to mitigate semantic ambiguity, design a geometry-aware cross-view encoder (GCE) for fine-grained geometric modeling, and systematically establish— for the first time—the synergistic optimization mechanism between opacity and geometric attributes, incorporating opacity-aware feature disentanglement and geometry-constrained loss. Our approach achieves state-of-the-art performance on Occ-ScanNet and EmbodiedOcc-ScanNet, outperforming prior methods by +8.47% in IoU and +6.27% in mIoU. This work establishes a novel, interpretable, and high-fidelity paradigm for joint geometric-semantic modeling grounded in 3D Gaussians.
📝 Abstract
3D occupancy prediction enables the robots to obtain spatial fine-grained geometry and semantics of the surrounding scene, and has become an essential task for embodied perception. Existing methods based on 3D Gaussians instead of dense voxels do not effectively exploit the geometry and opacity properties of Gaussians, which limits the network's estimation of complex environments and also limits the description of the scene by 3D Gaussians. In this paper, we propose a 3D occupancy prediction method which enhances the geometric and semantic scene understanding for robots, dubbed RoboOcc. It utilizes the Opacity-guided Self-Encoder (OSE) to alleviate the semantic ambiguity of overlapping Gaussians and the Geometry-aware Cross-Encoder (GCE) to accomplish the fine-grained geometric modeling of the surrounding scene. We conduct extensive experiments on Occ-ScanNet and EmbodiedOcc-ScanNet datasets, and our RoboOcc achieves state-of the-art performance in both local and global camera settings. Further, in ablation studies of Gaussian parameters, the proposed RoboOcc outperforms the state-of-the-art methods by a large margin of (8.47, 6.27) in IoU and mIoU metric, respectively. The codes will be released soon.