Towards Temporal Fusion Beyond the Field of View for Camera-based Semantic Scene Completion

📅 2025-11-16

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

Existing camera-based 3D semantic scene completion (SSC) methods struggle to reconstruct occluded regions—such as the vehicle’s lateral blind zones—due to reliance solely on the current frame and ineffective temporal fusion that fails to leverage contextual information from historical frames. To address this, we propose C3DFusion, a geometry-aware spatiotemporal fusion module that aligns 3D features across frames. Specifically, it first applies context-aware blurring to historical features to enhance robustness against pose uncertainty and sensor noise, then employs point-cloud-guided voxel densification to enable precise cross-frame feature aggregation. Evaluated on SemanticKITTI and SSCBench-KITTI-360, our method achieves significant improvements over state-of-the-art approaches, particularly in semantic and geometric reconstruction accuracy within lateral blind regions. Moreover, C3DFusion demonstrates strong generalizability across diverse backbone architectures, validating its architectural flexibility and effectiveness.

Technology Category

Application Category

📝 Abstract

Recent camera-based 3D semantic scene completion (SSC) methods have increasingly explored leveraging temporal cues to enrich the features of the current frame. However, while these approaches primarily focus on enhancing in-frame regions, they often struggle to reconstruct critical out-of-frame areas near the sides of the ego-vehicle, although previous frames commonly contain valuable contextual information about these unseen regions. To address this limitation, we propose the Current-Centric Contextual 3D Fusion (C3DFusion) module, which generates hidden region-aware 3D feature geometry by explicitly aligning 3D-lifted point features from both current and historical frames. C3DFusion performs enhanced temporal fusion through two complementary techniques-historical context blurring and current-centric feature densification-which suppress noise from inaccurately warped historical point features by attenuating their scale, and enhance current point features by increasing their volumetric contribution. Simply integrated into standard SSC architectures, C3DFusion demonstrates strong effectiveness, significantly outperforming state-of-the-art methods on the SemanticKITTI and SSCBench-KITTI-360 datasets. Furthermore, it exhibits robust generalization, achieving notable performance gains when applied to other baseline models.

Problem

Research questions and friction points this paper is trying to address.

Reconstructing out-of-frame areas in camera-based semantic scene completion

Aligning 3D features from current and historical frames for temporal fusion

Suppressing noise from inaccurately warped historical point features

Innovation

Methods, ideas, or system contributions that make the work stand out.

Aligns 3D-lifted point features from current and historical frames

Uses historical context blurring to suppress noise from warped features

Applies current-centric feature densification to enhance volumetric contributions

🔎 Similar Papers

No similar papers found.