🤖 AI Summary
Addressing the challenge of unsupervised clean 3D reconstruction in dynamic, cluttered first-person video scenes, this paper proposes a dynamic-static decoupling framework based on Gaussian rasterization. Methodologically, it introduces a novel dual-stream Gaussian modeling architecture that jointly represents foreground and background with learnable probabilistic masks, enabling end-to-end, fully self-supervised co-optimization—without requiring motion priors, keyframe selection, or segmentation annotations. Key innovations include: (1) a probabilistic mask-driven dynamic-static decoupling mechanism that explicitly separates moving objects from static geometry; and (2) an integrated design of self-supervised Gaussian splatting and joint rendering optimization. Extensive evaluations on benchmarks—including NeRF-on-the-go, ADT, AEA, Hot3D, and EPIC-Fields—demonstrate consistent state-of-the-art performance, with significant improvements in occluder suppression and geometric fidelity.
📝 Abstract
Reconstructing clean, distractor-free 3D scenes from real-world captures remains a significant challenge, particularly in highly dynamic and cluttered settings such as egocentric videos. To tackle this problem, we introduce DeGauss, a simple and robust self-supervised framework for dynamic scene reconstruction based on a decoupled dynamic-static Gaussian Splatting design. DeGauss models dynamic elements with foreground Gaussians and static content with background Gaussians, using a probabilistic mask to coordinate their composition and enable independent yet complementary optimization. DeGauss generalizes robustly across a wide range of real-world scenarios, from casual image collections to long, dynamic egocentric videos, without relying on complex heuristics or extensive supervision. Experiments on benchmarks including NeRF-on-the-go, ADT, AEA, Hot3D, and EPIC-Fields demonstrate that DeGauss consistently outperforms existing methods, establishing a strong baseline for generalizable, distractor-free 3D reconstructionin highly dynamic, interaction-rich environments.