🤖 AI Summary
This work addresses the challenge of achieving high-fidelity 4D dynamic scene reconstruction from only four sparse-view cameras. To this end, the authors propose 4C4D, a novel framework that, for the first time, enables high-quality 4D dynamic reconstruction by leveraging 4D Gaussian splatting. Central to their approach is a neural attenuation function that dynamically modulates Gaussian opacities, effectively mitigating the imbalance between geometry and appearance modeling under sparse observations. This mechanism substantially enhances geometric completeness and temporal consistency. Extensive experiments demonstrate that 4C4D outperforms existing methods across multiple sparse multi-view datasets, delivering high-fidelity and temporally coherent novel-view synthesis results.
📝 Abstract
This paper tackles the challenge of recovering 4D dynamic scenes from videos captured by as few as four portable cameras. Learning to model scene dynamics for temporally consistent novel-view rendering is a foundational task in computer graphics, where previous works often require dense multi-view captures using camera arrays of dozens or even hundreds of views. We propose \textbf{4C4D}, a novel framework that enables high-fidelity 4D Gaussian Splatting from video captures of extremely sparse cameras. Our key insight lies that the geometric learning under sparse settings is substantially more difficult than modeling appearance. Driven by this observation, we introduce a Neural Decaying Function on Gaussian opacities for enhancing the geometric modeling capability of 4D Gaussians. This design mitigates the inherent imbalance between geometry and appearance modeling in 4DGS by encouraging the 4DGS gradients to focus more on geometric learning. Extensive experiments across sparse-view datasets with varying camera overlaps show that 4C4D achieves superior performance over prior art. Project page at: https://junshengzhou.github.io/4C4D.