Flow4R: Unifying 4D Reconstruction and Tracking with Scene Flow

📅 2026-02-15

📈 Citations: 0

✨ Influential: 0

career value

163K/year

🤖 AI Summary

This work addresses the challenge of coupled geometry and motion in dynamic 3D scenes by proposing a unified 4D reconstruction and tracking method centered on camera-space scene flow. Built upon a Vision Transformer architecture, the approach symmetrically models dual-view inputs through a shared decoder and jointly predicts 3D geometry, bidirectional scene flow, pose weights, and confidence scores in a single forward pass—eliminating the need for explicit pose regression or bundle adjustment. Trained end-to-end, the model uniformly handles both static and dynamic scene elements, achieving state-of-the-art performance in 4D reconstruction and tracking. The results validate the efficacy and superiority of a scene-flow-centric representation for spatiotemporal scene understanding.

Technology Category

Application Category

📝 Abstract

Reconstructing and tracking dynamic 3D scenes remains a fundamental challenge in computer vision. Existing approaches often decouple geometry from motion: multi-view reconstruction methods assume static scenes, while dynamic tracking frameworks rely on explicit camera pose estimation or separate motion models. We propose Flow4R, a unified framework that treats camera-space scene flow as the central representation linking 3D structure, object motion, and camera motion. Flow4R predicts a minimal per-pixel property set-3D point position, scene flow, pose weight, and confidence-from two-view inputs using a Vision Transformer. This flow-centric formulation allows local geometry and bidirectional motion to be inferred symmetrically with a shared decoder in a single forward pass, without requiring explicit pose regressors or bundle adjustment. Trained jointly on static and dynamic datasets, Flow4R achieves state-of-the-art performance on 4D reconstruction and tracking tasks, demonstrating the effectiveness of the flow-central representation for spatiotemporal scene understanding.

Problem

Research questions and friction points this paper is trying to address.

4D reconstruction

scene flow

dynamic 3D scenes

object tracking

spatiotemporal understanding

Innovation

Methods, ideas, or system contributions that make the work stand out.

scene flow

4D reconstruction

unified framework