RoboOcc: Enhancing the Geometric and Semantic Scene Understanding for Robots

📅 2025-04-20

📈 Citations: 0

✨ Influential: 0

career value

208K/year

🤖 AI Summary

To address semantic ambiguity and geometric distortion caused by Gaussian overlap in 3D Gaussian representations, this paper proposes an embodied-perception-oriented occupancy prediction framework. Methodologically, we introduce the first opacity-guided self-encoder (OSE) to mitigate semantic ambiguity, design a geometry-aware cross-view encoder (GCE) for fine-grained geometric modeling, and systematically establish— for the first time—the synergistic optimization mechanism between opacity and geometric attributes, incorporating opacity-aware feature disentanglement and geometry-constrained loss. Our approach achieves state-of-the-art performance on Occ-ScanNet and EmbodiedOcc-ScanNet, outperforming prior methods by +8.47% in IoU and +6.27% in mIoU. This work establishes a novel, interpretable, and high-fidelity paradigm for joint geometric-semantic modeling grounded in 3D Gaussians.

Technology Category

Application Category

📝 Abstract

3D occupancy prediction enables the robots to obtain spatial fine-grained geometry and semantics of the surrounding scene, and has become an essential task for embodied perception. Existing methods based on 3D Gaussians instead of dense voxels do not effectively exploit the geometry and opacity properties of Gaussians, which limits the network's estimation of complex environments and also limits the description of the scene by 3D Gaussians. In this paper, we propose a 3D occupancy prediction method which enhances the geometric and semantic scene understanding for robots, dubbed RoboOcc. It utilizes the Opacity-guided Self-Encoder (OSE) to alleviate the semantic ambiguity of overlapping Gaussians and the Geometry-aware Cross-Encoder (GCE) to accomplish the fine-grained geometric modeling of the surrounding scene. We conduct extensive experiments on Occ-ScanNet and EmbodiedOcc-ScanNet datasets, and our RoboOcc achieves state-of the-art performance in both local and global camera settings. Further, in ablation studies of Gaussian parameters, the proposed RoboOcc outperforms the state-of-the-art methods by a large margin of (8.47, 6.27) in IoU and mIoU metric, respectively. The codes will be released soon.

Problem

Research questions and friction points this paper is trying to address.

Enhancing 3D occupancy prediction for robots

Improving geometric and semantic scene understanding

Addressing limitations of 3D Gaussian-based methods

Innovation

Methods, ideas, or system contributions that make the work stand out.

Opacity-guided Self-Encoder reduces semantic ambiguity

Geometry-aware Cross-Encoder improves fine-grained modeling

3D Gaussian-based occupancy prediction for robots

🔎 Similar Papers

No similar papers found.