Flux4D: Flow-based Unsupervised 4D Reconstruction

📅 2025-12-02
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
Existing NeRF and 3D Gaussian Splatting (3DGS) methods for large-scale dynamic 4D reconstruction suffer from limited scalability and reliance on explicit motion annotations, while self-supervised approaches lack cross-scene generalizability and require laborious hyperparameter tuning. Method: We propose the first fully unsupervised, cross-scene trainable 4D Gaussian reconstruction framework—requiring no pretraining, geometric priors, or motion supervision, and taking only multi-view video frames as input. It jointly optimizes the spatial distribution and rigid/non-rigid motion trajectories of 3D Gaussians via photometric consistency and a “prefer-static” regularization, enabling automatic disentanglement of dynamic elements. Results: Our method achieves state-of-the-art performance on outdoor driving datasets, with single-frame inference in seconds. It delivers high-fidelity 4D reconstructions, strong scalability, and superior cross-scene generalization—without scene-specific adaptation.

Technology Category

Application Category

📝 Abstract
Reconstructing large-scale dynamic scenes from visual observations is a fundamental challenge in computer vision, with critical implications for robotics and autonomous systems. While recent differentiable rendering methods such as Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS) have achieved impressive photorealistic reconstruction, they suffer from scalability limitations and require annotations to decouple actor motion. Existing self-supervised methods attempt to eliminate explicit annotations by leveraging motion cues and geometric priors, yet they remain constrained by per-scene optimization and sensitivity to hyperparameter tuning. In this paper, we introduce Flux4D, a simple and scalable framework for 4D reconstruction of large-scale dynamic scenes. Flux4D directly predicts 3D Gaussians and their motion dynamics to reconstruct sensor observations in a fully unsupervised manner. By adopting only photometric losses and enforcing an"as static as possible"regularization, Flux4D learns to decompose dynamic elements directly from raw data without requiring pre-trained supervised models or foundational priors simply by training across many scenes. Our approach enables efficient reconstruction of dynamic scenes within seconds, scales effectively to large datasets, and generalizes well to unseen environments, including rare and unknown objects. Experiments on outdoor driving datasets show Flux4D significantly outperforms existing methods in scalability, generalization, and reconstruction quality.
Problem

Research questions and friction points this paper is trying to address.

Reconstructs large-scale dynamic scenes from visual observations
Eliminates need for motion annotations in 4D reconstruction
Scales efficiently to large datasets and generalizes to unseen environments
Innovation

Methods, ideas, or system contributions that make the work stand out.

Predicts 3D Gaussians and motion dynamics unsupervised
Uses photometric loss and as-static-as-possible regularization
Trains across scenes for scalability and generalization
🔎 Similar Papers
No similar papers found.
Jingkang Wang
Jingkang Wang
University of Toronto
Computer VisionRoboticsMachine Learning
H
Henry Che
Waabi, UIUC
Y
Yun Chen
Waabi, University of Toronto
Z
Ze Yang
Waabi, University of Toronto
L
Lily Goli
Waabi, University of Toronto
S
S. Manivasagam
Waabi, University of Toronto
R
R. Urtasun
Waabi, University of Toronto