RAFT-MSF++: Temporal Geometry-Motion Feature Fusion for Self-Supervised Monocular Scene Flow

📅 2026-04-21

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

This work addresses the limitations of existing monocular scene flow methods, which typically rely on two-frame inputs and thus struggle to effectively model temporal dynamics and handle occluded regions robustly. To overcome these challenges, the authors propose a self-supervised multi-frame recurrent framework that compactly encodes coupled geometric and motion information through Geometric-Motion Features (GMF). The approach integrates a relative positional attention mechanism with an occlusion regularization module to enable efficient temporal feature fusion and iterative refinement. Evaluated on the KITTI scene flow benchmark, the method achieves a 24.14% SF-all error, representing a 30.99% improvement over the baseline, and demonstrates significantly enhanced robustness and temporal consistency in occluded areas.

Technology Category

Application Category

📝 Abstract

Monocular scene flow estimation aims to recover dense 3D motion from image sequences, yet most existing methods are limited to two-frame inputs, restricting temporal modeling and robustness to occlusions. We propose RAFT-MSF++, a self-supervised multi-frame framework that recurrently fuses temporal features to jointly estimate depth and scene flow. Central to our approach is the Geometry-Motion Feature (GMF), which compactly encodes coupled motion and geometry cues and is iteratively updated for effective temporal reasoning. To ensure the robustness of this temporal fusion against occlusions, we incorporate relative positional attention to inject spatial priors and an occlusion regularization module to propagate reliable motion from visible regions. These components enable the GMF to effectively propagate information even in ambiguous areas. Extensive experiments show that RAFT-MSF++ achieves 24.14% SF-all on the KITTI Scene Flow benchmark, with a 30.99% improvement over the baseline and better robustness in occluded regions. The code is available at https://github.com/sunzunyi/RAFT-MSF-PlusPlus.

Problem

Research questions and friction points this paper is trying to address.

monocular scene flow

temporal modeling

occlusion robustness

self-supervised learning

multi-frame input

Innovation

Methods, ideas, or system contributions that make the work stand out.

Geometry-Motion Feature

Temporal Fusion

Self-Supervised Scene Flow