CurriFlow: Curriculum-Guided Depth Fusion with Optical Flow-Based Temporal Alignment for 3D Semantic Scene Completion

πŸ“… 2025-10-14
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Existing monocular 3D semantic scene completion (SSC) methods rely on temporal stacking or depth projection, lacking explicit motion modeling and thus struggling with occlusions and noisy depth supervision. To address these limitations, we propose CurriFlow: (1) a flow-guided multi-level temporal feature alignment module to enhance dynamic object perception and temporal stability; (2) a curriculum learning strategy that progressively transitions supervision from high-accuracy sparse LiDAR depth to dense yet noisy stereo depth, mitigating supervision mismatch; and (3) integration of SAM to provide category-agnostic voxel-level semantic priors, improving semantic consistency. Evaluated on SemanticKITTI, CurriFlow achieves 16.9% mIoUβ€”the state-of-the-art performance among purely camera-driven SSC methods.

Technology Category

Application Category

πŸ“ Abstract
Semantic Scene Completion (SSC) aims to infer complete 3D geometry and semantics from monocular images, serving as a crucial capability for camera-based perception in autonomous driving. However, existing SSC methods relying on temporal stacking or depth projection often lack explicit motion reasoning and struggle with occlusions and noisy depth supervision. We propose CurriFlow, a novel semantic occupancy prediction framework that integrates optical flow-based temporal alignment with curriculum-guided depth fusion. CurriFlow employs a multi-level fusion strategy to align segmentation, visual, and depth features across frames using pre-trained optical flow, thereby improving temporal consistency and dynamic object understanding. To enhance geometric robustness, a curriculum learning mechanism progressively transitions from sparse yet accurate LiDAR depth to dense but noisy stereo depth during training, ensuring stable optimization and seamless adaptation to real-world deployment. Furthermore, semantic priors from the Segment Anything Model (SAM) provide category-agnostic supervision, strengthening voxel-level semantic learning and spatial consistency. Experiments on the SemanticKITTI benchmark demonstrate that CurriFlow achieves state-of-the-art performance with a mean IoU of 16.9, validating the effectiveness of our motion-guided and curriculum-aware design for camera-based 3D semantic scene completion.
Problem

Research questions and friction points this paper is trying to address.

Improves 3D semantic scene completion with optical flow alignment
Enhances geometric robustness using curriculum-guided depth fusion
Strengthens semantic learning with category-agnostic segmentation priors
Innovation

Methods, ideas, or system contributions that make the work stand out.

Optical flow aligns multi-level features across frames
Curriculum learning fuses sparse LiDAR and dense stereo depth
Semantic priors from SAM enhance voxel-level learning
πŸ”Ž Similar Papers
No similar papers found.
J
Jinzhou Lin
School of Artificial Intelligence, Beijing University of Posts and Telecommunications, China
J
Jie Zhou
School of Artificial Intelligence, Beijing University of Posts and Telecommunications, China
Wenhao Xu
Wenhao Xu
Unknown affiliation
Rongtao Xu
Rongtao Xu
MBZUAI << CASIA << HUST
Intelligent RobotEmbodied AIVLAVLMSpatialtemporal AI
Changwei Wang
Changwei Wang
Shandong Computer Science Center
Multimodal LearningEmbodied AIEdge Intelligent ComputingAI for HealthcareSafety Alignment
S
Shunpeng Chen
School of Artificial Intelligence, Beijing University of Posts and Telecommunications, China
Kexue Fu
Kexue Fu
City University of Hong Kong
HCIStorytellingCreativityCognitionHuman-AI collaboration
Y
Yihua Shao
Institute of Automation, Chinese Academy of Sciences, China
L
Li Guo
School of Artificial Intelligence, Beijing University of Posts and Telecommunications, China
Shibiao Xu
Shibiao Xu
Beijing University of Posts and Telecommunications
Computer VisionMachine LearningComputer Graphics