TimeFormer: Capturing Temporal Relationships of Deformable 3D Gaussians for Robust Reconstruction

📅 2024-11-18
🏛️ arXiv.org
📈 Citations: 2
Influential: 0
📄 PDF
🤖 AI Summary
Dynamic scene reconstruction suffers from insufficient robustness under large motions, extreme geometries, and highly reflective surfaces. To address this, we propose TimeFormer—the first cross-temporal transformer module explicitly designed for deformable 3D Gaussians—that implicitly models inter-frame temporal dependencies without imposing explicit motion priors or constraints. We introduce a dual-stream optimization strategy to distill motion-aware knowledge and incorporate temporal consistency supervision during training, while incurring zero inference overhead. Our method supports both multi-view and monocular inputs, preserving the original rendering speed of 3D Gaussian splatting. Extensive experiments demonstrate significant improvements in geometric accuracy and appearance fidelity over state-of-the-art methods, with consistent gains in both qualitative visual quality and quantitative metrics across diverse dynamic scenes.

Technology Category

Application Category

📝 Abstract
Dynamic scene reconstruction is a long-term challenge in 3D vision. Recent methods extend 3D Gaussian Splatting to dynamic scenes via additional deformation fields and apply explicit constraints like motion flow to guide the deformation. However, they learn motion changes from individual timestamps independently, making it challenging to reconstruct complex scenes, particularly when dealing with violent movement, extreme-shaped geometries, or reflective surfaces. To address the above issue, we design a plug-and-play module called TimeFormer to enable existing deformable 3D Gaussians reconstruction methods with the ability to implicitly model motion patterns from a learning perspective. Specifically, TimeFormer includes a Cross-Temporal Transformer Encoder, which adaptively learns the temporal relationships of deformable 3D Gaussians. Furthermore, we propose a two-stream optimization strategy that transfers the motion knowledge learned from TimeFormer to the base stream during the training phase. This allows us to remove TimeFormer during inference, thereby preserving the original rendering speed. Extensive experiments in the multi-view and monocular dynamic scenes validate qualitative and quantitative improvement brought by TimeFormer. Project Page: https://patrickddj.github.io/TimeFormer/
Problem

Research questions and friction points this paper is trying to address.

Modeling temporal relationships in dynamic 3D scene reconstruction
Addressing complex motion and geometry in deformable Gaussian reconstruction
Enhancing reconstruction robustness for violent movements and reflective surfaces
Innovation

Methods, ideas, or system contributions that make the work stand out.

Plug-and-play module models motion implicitly
Cross-Temporal Transformer learns temporal relationships
Two-stream optimization preserves original rendering speed
D
Dadong Jiang
Tianjin University
Z
Zhihui Ke
Tianjin University
X
Xiaobo Zhou
Tianjin University
Zhi Hou
Zhi Hou
The University of Sydney
Computer VisionMachine Learning
X
Xianghui Yang
Tencent Hunyuan
W
Wenbo Hu
Tencent AI Lab
Tie Qiu
Tie Qiu
Tianjin University
Industrial Internet of ThingsBig DataIntelligent Networking
C
Chunchao Guo
Tencent Hunyuan