🤖 AI Summary
Existing 3D semantic scene completion (SSC) methods face a fundamental trade-off: voxel- or plane-based approaches lack geometric fidelity and fail to encode physical constraints, while neural rendering methods—such as NeRF and 3D Gaussian Splatting (3DGS)—exhibit prohibitive computational cost, slow convergence, and limited semantic accuracy in large-scale autonomous driving scenarios. To address this, we propose a voxel-Gaussian hybrid representation framework that jointly optimizes semantic understanding and physically grounded geometric modeling. Our method integrates dual-branch 3D representation, semantic spherical harmonic encoding, Gaussian distribution alignment, and differentiable rendering. Crucially, it introduces semantic-guided Gaussian initialization and a physics-aware spherical harmonic enhancement module. Evaluated on SemanticKITTI and SSCBench-KITTI-360, our approach significantly outperforms state-of-the-art voxel-based and NeRF-based SSC methods, achieving new SOTA performance in both semantic accuracy and geometric realism.
📝 Abstract
Camera-based 3D Semantic Scene Completion (SSC) is a critical task in autonomous driving systems, assessing voxel-level geometry and semantics for holistic scene perception. While existing voxel-based and plane-based SSC methods have achieved considerable progress, they struggle to capture physical regularities for realistic geometric details. On the other hand, neural reconstruction methods like NeRF and 3DGS demonstrate superior physical awareness, but suffer from high computational cost and slow convergence when handling large-scale, complex autonomous driving scenes, leading to inferior semantic accuracy. To address these issues, we propose the Semantic-PHysical Engaged REpresentation (SPHERE) for camera-based SSC, which integrates voxel and Gaussian representations for joint exploitation of semantic and physical information. First, the Semantic-guided Gaussian Initialization (SGI) module leverages dual-branch 3D scene representations to locate focal voxels as anchors to guide efficient Gaussian initialization. Then, the Physical-aware Harmonics Enhancement (PHE) module incorporates semantic spherical harmonics to model physical-aware contextual details and promote semantic-geometry consistency through focal distribution alignment, generating SSC results with realistic details. Extensive experiments and analyses on the popular SemanticKITTI and SSCBench-KITTI-360 benchmarks validate the effectiveness of SPHERE. The code is available at https://github.com/PKU-ICST-MIPL/SPHERE_ACMMM2025.