Split4D: Decomposed 4D Scene Reconstruction Without Video Segmentation

📅 2025-12-27

📈 Citations: 0

✨ Influential: 0

career value

175K/year

🤖 AI Summary

Existing methods rely on unstable video segmentation, leading to poor robustness in multi-view 4D scene reconstruction. To address this, we propose Freetime FeatureGS—a segmentation-free, decoupled 4D reconstruction framework. It leverages single-frame image segmentation as weak supervision to guide Gaussian primitives in learning differentiable temporal features, linear motion modeling, and cross-frame contrastive constraints. Our approach integrates dynamic Gaussian splatting rendering, temporal contrastive loss, and a streaming ordered sampling strategy. To our knowledge, this is the first method enabling instance-level, segmentation-agnostic 4D reconstruction with natural temporal extrapolation. Extensive experiments demonstrate significant improvements over state-of-the-art methods across multiple benchmarks: higher reconstruction accuracy, enhanced optimization robustness, and effective mitigation of local minima.

Technology Category

Application Category

📝 Abstract

This paper addresses the problem of decomposed 4D scene reconstruction from multi-view videos. Recent methods achieve this by lifting video segmentation results to a 4D representation through differentiable rendering techniques. Therefore, they heavily rely on the quality of video segmentation maps, which are often unstable, leading to unreliable reconstruction results. To overcome this challenge, our key idea is to represent the decomposed 4D scene with the Freetime FeatureGS and design a streaming feature learning strategy to accurately recover it from per-image segmentation maps, eliminating the need for video segmentation. Freetime FeatureGS models the dynamic scene as a set of Gaussian primitives with learnable features and linear motion ability, allowing them to move to neighboring regions over time. We apply a contrastive loss to Freetime FeatureGS, forcing primitive features to be close or far apart based on whether their projections belong to the same instance in the 2D segmentation map. As our Gaussian primitives can move across time, it naturally extends the feature learning to the temporal dimension, achieving 4D segmentation. Furthermore, we sample observations for training in a temporally ordered manner, enabling the streaming propagation of features over time and effectively avoiding local minima during the optimization process. Experimental results on several datasets show that the reconstruction quality of our method outperforms recent methods by a large margin.

Problem

Research questions and friction points this paper is trying to address.

Decomposes 4D scenes from multi-view videos without video segmentation

Overcomes unreliable reconstruction from unstable video segmentation maps

Uses Gaussian primitives with contrastive loss for 4D segmentation and streaming learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Freetime FeatureGS models dynamic scenes with Gaussian primitives

Contrastive loss aligns features with 2D segmentation maps

Streaming feature learning propagates features temporally without video segmentation

🔎 Similar Papers

MonST3R: A Simple Approach for Estimating Geometry in the Presence of Motion