🤖 AI Summary
Existing camera-based 3D Semantic Scene Completion (SSC) methods employ coupled encoders to jointly model semantic and geometric priors, leading to mutual interference and suboptimal performance. To address this, we propose FoundationSSC—a novel framework featuring dual-level decoupling: source-level decoupling for independent extraction of semantic and geometric features, and path-level decoupling for their separate processing. We further introduce an Axis-Aware Fusion module to effectively resolve anisotropic multi-view feature fusion, integrated with hybrid view transformation and foundation-model-driven stereo cost volume construction. On SemanticKITTI, FoundationSSC achieves +0.23 mIoU and +2.03 IoU over prior art; on SSCBench-KITTI-360, it attains 21.78 mIoU and 48.61 IoU—setting new state-of-the-art performance.
📝 Abstract
Camera-based 3D semantic scene completion (SSC) provides dense geometric and semantic perception for autonomous driving and robotic navigation. However, existing methods rely on a coupled encoder to deliver both semantic and geometric priors, which forces the model to make a trade-off between conflicting demands and limits its overall performance. To tackle these challenges, we propose FoundationSSC, a novel framework that performs dual decoupling at both the source and pathway levels. At the source level, we introduce a foundation encoder that provides rich semantic feature priors for the semantic branch and high-fidelity stereo cost volumes for the geometric branch. At the pathway level, these priors are refined through specialised, decoupled pathways, yielding superior semantic context and depth distributions. Our dual-decoupling design produces disentangled and refined inputs, which are then utilised by a hybrid view transformation to generate complementary 3D features. Additionally, we introduce a novel Axis-Aware Fusion (AAF) module that addresses the often-overlooked challenge of fusing these features by anisotropically merging them into a unified representation. Extensive experiments demonstrate the advantages of FoundationSSC, achieving simultaneous improvements in both semantic and geometric metrics, surpassing prior bests by +0.23 mIoU and +2.03 IoU on SemanticKITTI. Additionally, we achieve state-of-the-art performance on SSCBench-KITTI-360, with 21.78 mIoU and 48.61 IoU. The code will be released upon acceptance.