ClipGStream: Clip-Stream Gaussian Splatting for Any Length and Any Motion Multi-View Dynamic Scene Reconstruction

📅 2026-04-15
📈 Citations: 0
Influential: 0
📄 PDF

career value

217K/year
🤖 AI Summary
This work addresses the challenge of simultaneously achieving temporal stability, manageable memory consumption, and scalability to long sequences in multi-view dynamic scene reconstruction under large motions. To this end, the authors propose ClipGStream, a hybrid reconstruction framework that partitions input video into short clips. Within each clip, local dynamics are modeled using dynamic 3D Gaussian splatting combined with a spatio-temporal field, augmented by residual anchor points for compensation. Global consistency across clips is maintained through anchor inheritance and a shared decoder. By innovatively integrating clip-based and streaming paradigms, ClipGStream enables clip-level streaming optimization, supporting arbitrarily long video reconstruction while preserving high temporal coherence. Experiments demonstrate state-of-the-art reconstruction quality and temporal stability on multiple dynamic scene benchmarks, alongside significantly reduced memory overhead, enabling efficient, high-fidelity 3D reconstruction of long-duration dynamic sequences.

Technology Category

Application Category

📝 Abstract
Dynamic 3D scene reconstruction is essential for immersive media such as VR, MR, and XR, yet remains challenging for long multi-view sequences with large-scale motion. Existing dynamic Gaussian approaches are either Frame-Stream, offering scalability but poor temporal stability, or Clip, achieving local consistency at the cost of high memory and limited sequence length. We propose ClipGStream, a hybrid reconstruction framework that performs stream optimization at the clip level rather than the frame level. The sequence is divided into short clips, where dynamic motion is modeled using clip-independent spatio-temporal fields and residual anchor compensation to capture local variations efficiently, while inter-clip inherited anchors and decoders maintain structural consistency across clips. This Clip-Stream design enables scalable, flicker-free reconstruction of long dynamic videos with high temporal coherence and reduced memory overhead. Extensive experiments demonstrate that ClipGStream achieves state-of-the-art reconstruction quality and efficiency. The project page is available at: https://liangjie1999.github.io/ClipGStreamWeb/
Problem

Research questions and friction points this paper is trying to address.

dynamic 3D scene reconstruction
multi-view sequences
temporal coherence
scalability
memory efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Clip-Stream
Dynamic Gaussian Splatting
Temporal Coherence
Memory-Efficient Reconstruction
Multi-View Dynamic Scene
🔎 Similar Papers
No similar papers found.
J
Jie Liang
Guangdong Provincial Key Laboratory of Ultra High Definition Immersive Media Technology, Shenzhen Graduate School, Peking University; Pengcheng Laboratory
Jiahao Wu
Jiahao Wu
The Chinese University of Hong Kong
Medical RobotsRobot-assisted MicrosurgeryMotion Planning
C
Chao Wang
Pengcheng Laboratory
Jiayu Yang
Jiayu Yang
The Australian National University
3D Computer Vision3D AIGC3D ReconstructionMulti-view StereoVR AR XR
X
Xiaoyun Zheng
Guangdong Provincial Key Laboratory of Ultra High Definition Immersive Media Technology, Shenzhen Graduate School, Peking University; Pengcheng Laboratory
K
Kaiqiang Xiong
Guangdong Provincial Key Laboratory of Ultra High Definition Immersive Media Technology, Shenzhen Graduate School, Peking University; Pengcheng Laboratory
Z
Zhanke Wang
Guangdong Provincial Key Laboratory of Ultra High Definition Immersive Media Technology, Shenzhen Graduate School, Peking University; Pengcheng Laboratory
J
Jinbo Yan
Guangdong Provincial Key Laboratory of Ultra High Definition Immersive Media Technology, Shenzhen Graduate School, Peking University; Pengcheng Laboratory
F
Feng Gao
Peking University
Ronggang Wang
Ronggang Wang
Shenzhen Graduate School, Peking University
Immersive Video Coding and Processing