Split4D: Decomposed 4D Scene Reconstruction Without Video Segmentation

πŸ“… 2025-12-27
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Existing methods rely on unstable video segmentation, leading to poor robustness in multi-view 4D scene reconstruction. To address this, we propose Freetime FeatureGSβ€”a segmentation-free, decoupled 4D reconstruction framework. It leverages single-frame image segmentation as weak supervision to guide Gaussian primitives in learning differentiable temporal features, linear motion modeling, and cross-frame contrastive constraints. Our approach integrates dynamic Gaussian splatting rendering, temporal contrastive loss, and a streaming ordered sampling strategy. To our knowledge, this is the first method enabling instance-level, segmentation-agnostic 4D reconstruction with natural temporal extrapolation. Extensive experiments demonstrate significant improvements over state-of-the-art methods across multiple benchmarks: higher reconstruction accuracy, enhanced optimization robustness, and effective mitigation of local minima.

Technology Category

Application Category

πŸ“ Abstract
This paper addresses the problem of decomposed 4D scene reconstruction from multi-view videos. Recent methods achieve this by lifting video segmentation results to a 4D representation through differentiable rendering techniques. Therefore, they heavily rely on the quality of video segmentation maps, which are often unstable, leading to unreliable reconstruction results. To overcome this challenge, our key idea is to represent the decomposed 4D scene with the Freetime FeatureGS and design a streaming feature learning strategy to accurately recover it from per-image segmentation maps, eliminating the need for video segmentation. Freetime FeatureGS models the dynamic scene as a set of Gaussian primitives with learnable features and linear motion ability, allowing them to move to neighboring regions over time. We apply a contrastive loss to Freetime FeatureGS, forcing primitive features to be close or far apart based on whether their projections belong to the same instance in the 2D segmentation map. As our Gaussian primitives can move across time, it naturally extends the feature learning to the temporal dimension, achieving 4D segmentation. Furthermore, we sample observations for training in a temporally ordered manner, enabling the streaming propagation of features over time and effectively avoiding local minima during the optimization process. Experimental results on several datasets show that the reconstruction quality of our method outperforms recent methods by a large margin.
Problem

Research questions and friction points this paper is trying to address.

Decomposes 4D scenes from multi-view videos without video segmentation
Overcomes unreliable reconstruction from unstable video segmentation maps
Uses Gaussian primitives with contrastive loss for 4D segmentation and streaming learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Freetime FeatureGS models dynamic scenes with Gaussian primitives
Contrastive loss aligns features with 2D segmentation maps
Streaming feature learning propagates features temporally without video segmentation
πŸ”Ž Similar Papers
No similar papers found.
Y
Yongzhen Hu
Zhejiang University, China and Ant Group, China
Y
Yihui Yang
Zhejiang University, China
Haotong Lin
Haotong Lin
Zhejiang university
Computer Vision and Graphics
Y
Yifan Wang
Zhejiang University, China
Junting Dong
Junting Dong
Zhejiang University
Computer Vision
Y
Yifu Deng
Ant Group, China
X
Xinyu Zhu
Ant Group, China
Fan Jia
Fan Jia
Faculty of Chemistry and Biochemistry, Ruhr-University of Bochum
Organic Chemistry
H
Hujun Bao
State Key Lab of CAD&CG, Zhejiang University, China
Xiaowei Zhou
Xiaowei Zhou
Professor of Computer Science, Zhejiang University
Computer VisionComputer Graphics
Sida Peng
Sida Peng
Zhejiang University
Computer VisionComputer Graphics