SPHERE: Semantic-PHysical Engaged REpresentation for 3D Semantic Scene Completion

📅 2025-09-14

📈 Citations: 0

✨ Influential: 0

career value

241K/year

🤖 AI Summary

Existing 3D semantic scene completion (SSC) methods face a fundamental trade-off: voxel- or plane-based approaches lack geometric fidelity and fail to encode physical constraints, while neural rendering methods—such as NeRF and 3D Gaussian Splatting (3DGS)—exhibit prohibitive computational cost, slow convergence, and limited semantic accuracy in large-scale autonomous driving scenarios. To address this, we propose a voxel-Gaussian hybrid representation framework that jointly optimizes semantic understanding and physically grounded geometric modeling. Our method integrates dual-branch 3D representation, semantic spherical harmonic encoding, Gaussian distribution alignment, and differentiable rendering. Crucially, it introduces semantic-guided Gaussian initialization and a physics-aware spherical harmonic enhancement module. Evaluated on SemanticKITTI and SSCBench-KITTI-360, our approach significantly outperforms state-of-the-art voxel-based and NeRF-based SSC methods, achieving new SOTA performance in both semantic accuracy and geometric realism.

Technology Category

Application Category

📝 Abstract

Camera-based 3D Semantic Scene Completion (SSC) is a critical task in autonomous driving systems, assessing voxel-level geometry and semantics for holistic scene perception. While existing voxel-based and plane-based SSC methods have achieved considerable progress, they struggle to capture physical regularities for realistic geometric details. On the other hand, neural reconstruction methods like NeRF and 3DGS demonstrate superior physical awareness, but suffer from high computational cost and slow convergence when handling large-scale, complex autonomous driving scenes, leading to inferior semantic accuracy. To address these issues, we propose the Semantic-PHysical Engaged REpresentation (SPHERE) for camera-based SSC, which integrates voxel and Gaussian representations for joint exploitation of semantic and physical information. First, the Semantic-guided Gaussian Initialization (SGI) module leverages dual-branch 3D scene representations to locate focal voxels as anchors to guide efficient Gaussian initialization. Then, the Physical-aware Harmonics Enhancement (PHE) module incorporates semantic spherical harmonics to model physical-aware contextual details and promote semantic-geometry consistency through focal distribution alignment, generating SSC results with realistic details. Extensive experiments and analyses on the popular SemanticKITTI and SSCBench-KITTI-360 benchmarks validate the effectiveness of SPHERE. The code is available at https://github.com/PKU-ICST-MIPL/SPHERE_ACMMM2025.

Problem

Research questions and friction points this paper is trying to address.

Combining voxel and Gaussian representations for semantic scene completion

Improving geometric detail realism in autonomous driving scenes

Enhancing semantic accuracy while maintaining computational efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates voxel and Gaussian representations for SSC

Semantic-guided Gaussian initialization with focal voxels

Physical-aware harmonics enhancement for detail consistency

🔎 Similar Papers

Semi-supervised 3D Semantic Scene Completion with 2D Vision Foundation Model Guidance