FLAG-4D: Flow-Guided Local-Global Dual-Deformation Model for 4D Reconstruction

📅 2026-02-09

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

Existing methods struggle to achieve high-fidelity reconstruction of complex point motions and fine time-varying details in dynamic scenes from sparse viewpoints. This work proposes a dual-deformation network architecture based on a canonical 3D Gaussian representation, which jointly models spatiotemporal deformations through a Local Instantaneous Deformation Network (IDN) and a Global Motion Network (GMN). To effectively integrate multi-frame motion cues, the approach introduces a deformation-guided attention mechanism that fuses pretrained optical flow features. The proposed method significantly outperforms current state-of-the-art techniques in 4D reconstruction, achieving notable improvements in both the recovery of dynamic details and temporal consistency.

Technology Category

Application Category

📝 Abstract

We introduce FLAG-4D, a novel framework for generating novel views of dynamic scenes by reconstructing how 3D Gaussian primitives evolve through space and time. Existing methods typically rely on a single Multilayer Perceptron (MLP) to model temporal deformations, and they often struggle to capture complex point motions and fine-grained dynamic details consistently over time, especially from sparse input views. Our approach, FLAG-4D, overcomes this by employing a dual-deformation network that dynamically warps a canonical set of 3D Gaussians over time into new positions and anisotropic shapes. This dual-deformation network consists of an Instantaneous Deformation Network (IDN) for modeling fine-grained, local deformations and a Global Motion Network (GMN) for capturing long-range dynamics, refined through mutual learning. To ensure these deformations are both accurate and temporally smooth, FLAG-4D incorporates dense motion features from a pretrained optical flow backbone. We fuse these motion cues from adjacent timeframes and use a deformation-guided attention mechanism to align this flow information with the current state of each evolving 3D Gaussian. Extensive experiments demonstrate that FLAG-4D achieves higher-fidelity and more temporally coherent reconstructions with finer detail preservation than state-of-the-art methods.

Problem

Research questions and friction points this paper is trying to address.

4D reconstruction

dynamic scenes

temporal deformation

sparse views

motion coherence

Innovation

Methods, ideas, or system contributions that make the work stand out.

dual-deformation network

4D reconstruction

3D Gaussian primitives