FLAG-4D: Flow-Guided Local-Global Dual-Deformation Model for 4D Reconstruction

📅 2026-02-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing methods struggle to achieve high-fidelity reconstruction of complex point motions and fine time-varying details in dynamic scenes from sparse viewpoints. This work proposes a dual-deformation network architecture based on a canonical 3D Gaussian representation, which jointly models spatiotemporal deformations through a Local Instantaneous Deformation Network (IDN) and a Global Motion Network (GMN). To effectively integrate multi-frame motion cues, the approach introduces a deformation-guided attention mechanism that fuses pretrained optical flow features. The proposed method significantly outperforms current state-of-the-art techniques in 4D reconstruction, achieving notable improvements in both the recovery of dynamic details and temporal consistency.

Technology Category

Application Category

📝 Abstract
We introduce FLAG-4D, a novel framework for generating novel views of dynamic scenes by reconstructing how 3D Gaussian primitives evolve through space and time. Existing methods typically rely on a single Multilayer Perceptron (MLP) to model temporal deformations, and they often struggle to capture complex point motions and fine-grained dynamic details consistently over time, especially from sparse input views. Our approach, FLAG-4D, overcomes this by employing a dual-deformation network that dynamically warps a canonical set of 3D Gaussians over time into new positions and anisotropic shapes. This dual-deformation network consists of an Instantaneous Deformation Network (IDN) for modeling fine-grained, local deformations and a Global Motion Network (GMN) for capturing long-range dynamics, refined through mutual learning. To ensure these deformations are both accurate and temporally smooth, FLAG-4D incorporates dense motion features from a pretrained optical flow backbone. We fuse these motion cues from adjacent timeframes and use a deformation-guided attention mechanism to align this flow information with the current state of each evolving 3D Gaussian. Extensive experiments demonstrate that FLAG-4D achieves higher-fidelity and more temporally coherent reconstructions with finer detail preservation than state-of-the-art methods.
Problem

Research questions and friction points this paper is trying to address.

4D reconstruction
dynamic scenes
temporal deformation
sparse views
motion coherence
Innovation

Methods, ideas, or system contributions that make the work stand out.

dual-deformation network
4D reconstruction
3D Gaussian primitives
optical flow guidance
temporal coherence
🔎 Similar Papers
No similar papers found.
G
Guan Yuan Tan
Monash University
N
Ngoc Tuan Vu
Monash University
A
Arghya Pal
Monash University
Sailaja Rajanala
Sailaja Rajanala
Monash University Malaysia
NLPCausalityRepresentation Learning
R
Raphael Phan C. -W.
Monash University
M
Mettu Srinivas
National Institute of Technology Warangal
Chee-Ming Ting
Chee-Ming Ting
Associate Professor, Monash University. PhD (Maths - Statistics)
Statistical Signal ProcessingMachine LearningBiomedical SignalsNeuroimagingTime Series Analysis