RoboOcc: Enhancing the Geometric and Semantic Scene Understanding for Robots

📅 2025-04-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address semantic ambiguity and geometric distortion caused by Gaussian overlap in 3D Gaussian representations, this paper proposes an embodied-perception-oriented occupancy prediction framework. Methodologically, we introduce the first opacity-guided self-encoder (OSE) to mitigate semantic ambiguity, design a geometry-aware cross-view encoder (GCE) for fine-grained geometric modeling, and systematically establish— for the first time—the synergistic optimization mechanism between opacity and geometric attributes, incorporating opacity-aware feature disentanglement and geometry-constrained loss. Our approach achieves state-of-the-art performance on Occ-ScanNet and EmbodiedOcc-ScanNet, outperforming prior methods by +8.47% in IoU and +6.27% in mIoU. This work establishes a novel, interpretable, and high-fidelity paradigm for joint geometric-semantic modeling grounded in 3D Gaussians.

Technology Category

Application Category

📝 Abstract
3D occupancy prediction enables the robots to obtain spatial fine-grained geometry and semantics of the surrounding scene, and has become an essential task for embodied perception. Existing methods based on 3D Gaussians instead of dense voxels do not effectively exploit the geometry and opacity properties of Gaussians, which limits the network's estimation of complex environments and also limits the description of the scene by 3D Gaussians. In this paper, we propose a 3D occupancy prediction method which enhances the geometric and semantic scene understanding for robots, dubbed RoboOcc. It utilizes the Opacity-guided Self-Encoder (OSE) to alleviate the semantic ambiguity of overlapping Gaussians and the Geometry-aware Cross-Encoder (GCE) to accomplish the fine-grained geometric modeling of the surrounding scene. We conduct extensive experiments on Occ-ScanNet and EmbodiedOcc-ScanNet datasets, and our RoboOcc achieves state-of the-art performance in both local and global camera settings. Further, in ablation studies of Gaussian parameters, the proposed RoboOcc outperforms the state-of-the-art methods by a large margin of (8.47, 6.27) in IoU and mIoU metric, respectively. The codes will be released soon.
Problem

Research questions and friction points this paper is trying to address.

Enhancing 3D occupancy prediction for robots
Improving geometric and semantic scene understanding
Addressing limitations of 3D Gaussian-based methods
Innovation

Methods, ideas, or system contributions that make the work stand out.

Opacity-guided Self-Encoder reduces semantic ambiguity
Geometry-aware Cross-Encoder improves fine-grained modeling
3D Gaussian-based occupancy prediction for robots
🔎 Similar Papers
No similar papers found.
Z
Zhang Zhang
Beijing Innovation Center of Humanoid Robotics, Beijing Institute of Technology
Q
Qiang Zhang
Beijing Innovation Center of Humanoid Robotics, Hong Kong University of Science and Technology (Guangzhou)
W
Wei Cui
Beijing Innovation Center of Humanoid Robotics
S
Shuai Shi
Beijing Innovation Center of Humanoid Robotics
Y
Yijie Guo
Beijing Innovation Center of Humanoid Robotics
Gang Han
Gang Han
Professor of Biostatistics, Texas A&M University
StatisticsBiostatisticsMedical researchComputer experiments
Wen Zhao
Wen Zhao
JSPS International Fellow, UT-Austin Postdoc, KAUST
MEMSSensorNonlinear Dynamics
H
Hengle Ren
Beijing Innovation Center of Humanoid Robotics
Renjing Xu
Renjing Xu
HKUST(GZ)
Brain-inspired ComputingHumanoid Computing
J
Jian Tang
Beijing Innovation Center of Humanoid Robotics