RAFT-MSF++: Temporal Geometry-Motion Feature Fusion for Self-Supervised Monocular Scene Flow

📅 2026-04-21
📈 Citations: 0
Influential: 0
📄 PDF

career value

204K/year
🤖 AI Summary
This work addresses the limitations of existing monocular scene flow methods, which typically rely on two-frame inputs and thus struggle to effectively model temporal dynamics and handle occluded regions robustly. To overcome these challenges, the authors propose a self-supervised multi-frame recurrent framework that compactly encodes coupled geometric and motion information through Geometric-Motion Features (GMF). The approach integrates a relative positional attention mechanism with an occlusion regularization module to enable efficient temporal feature fusion and iterative refinement. Evaluated on the KITTI scene flow benchmark, the method achieves a 24.14% SF-all error, representing a 30.99% improvement over the baseline, and demonstrates significantly enhanced robustness and temporal consistency in occluded areas.

Technology Category

Application Category

📝 Abstract
Monocular scene flow estimation aims to recover dense 3D motion from image sequences, yet most existing methods are limited to two-frame inputs, restricting temporal modeling and robustness to occlusions. We propose RAFT-MSF++, a self-supervised multi-frame framework that recurrently fuses temporal features to jointly estimate depth and scene flow. Central to our approach is the Geometry-Motion Feature (GMF), which compactly encodes coupled motion and geometry cues and is iteratively updated for effective temporal reasoning. To ensure the robustness of this temporal fusion against occlusions, we incorporate relative positional attention to inject spatial priors and an occlusion regularization module to propagate reliable motion from visible regions. These components enable the GMF to effectively propagate information even in ambiguous areas. Extensive experiments show that RAFT-MSF++ achieves 24.14% SF-all on the KITTI Scene Flow benchmark, with a 30.99% improvement over the baseline and better robustness in occluded regions. The code is available at https://github.com/sunzunyi/RAFT-MSF-PlusPlus.
Problem

Research questions and friction points this paper is trying to address.

monocular scene flow
temporal modeling
occlusion robustness
self-supervised learning
multi-frame input
Innovation

Methods, ideas, or system contributions that make the work stand out.

Geometry-Motion Feature
Temporal Fusion
Self-Supervised Scene Flow
Occlusion Regularization
Relative Positional Attention