SPHERE: Semantic-PHysical Engaged REpresentation for 3D Semantic Scene Completion

📅 2025-09-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing 3D semantic scene completion (SSC) methods face a fundamental trade-off: voxel- or plane-based approaches lack geometric fidelity and fail to encode physical constraints, while neural rendering methods—such as NeRF and 3D Gaussian Splatting (3DGS)—exhibit prohibitive computational cost, slow convergence, and limited semantic accuracy in large-scale autonomous driving scenarios. To address this, we propose a voxel-Gaussian hybrid representation framework that jointly optimizes semantic understanding and physically grounded geometric modeling. Our method integrates dual-branch 3D representation, semantic spherical harmonic encoding, Gaussian distribution alignment, and differentiable rendering. Crucially, it introduces semantic-guided Gaussian initialization and a physics-aware spherical harmonic enhancement module. Evaluated on SemanticKITTI and SSCBench-KITTI-360, our approach significantly outperforms state-of-the-art voxel-based and NeRF-based SSC methods, achieving new SOTA performance in both semantic accuracy and geometric realism.

Technology Category

Application Category

📝 Abstract
Camera-based 3D Semantic Scene Completion (SSC) is a critical task in autonomous driving systems, assessing voxel-level geometry and semantics for holistic scene perception. While existing voxel-based and plane-based SSC methods have achieved considerable progress, they struggle to capture physical regularities for realistic geometric details. On the other hand, neural reconstruction methods like NeRF and 3DGS demonstrate superior physical awareness, but suffer from high computational cost and slow convergence when handling large-scale, complex autonomous driving scenes, leading to inferior semantic accuracy. To address these issues, we propose the Semantic-PHysical Engaged REpresentation (SPHERE) for camera-based SSC, which integrates voxel and Gaussian representations for joint exploitation of semantic and physical information. First, the Semantic-guided Gaussian Initialization (SGI) module leverages dual-branch 3D scene representations to locate focal voxels as anchors to guide efficient Gaussian initialization. Then, the Physical-aware Harmonics Enhancement (PHE) module incorporates semantic spherical harmonics to model physical-aware contextual details and promote semantic-geometry consistency through focal distribution alignment, generating SSC results with realistic details. Extensive experiments and analyses on the popular SemanticKITTI and SSCBench-KITTI-360 benchmarks validate the effectiveness of SPHERE. The code is available at https://github.com/PKU-ICST-MIPL/SPHERE_ACMMM2025.
Problem

Research questions and friction points this paper is trying to address.

Combining voxel and Gaussian representations for semantic scene completion
Improving geometric detail realism in autonomous driving scenes
Enhancing semantic accuracy while maintaining computational efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates voxel and Gaussian representations for SSC
Semantic-guided Gaussian initialization with focal voxels
Physical-aware harmonics enhancement for detail consistency
🔎 Similar Papers
No similar papers found.
Zhiwen Yang
Zhiwen Yang
Beihang University
Low-level VisionAIGCMedical Image Analysis
Y
Yuxin Peng
Peking University, Wangxuan Institute of Computer Technology, Beijing, China