MoDGS: Dynamic Gaussian Splatting from Casually-captured Monocular Videos with Depth Priors

📅 2024-06-01
📈 Citations: 16
Influential: 1
📄 PDF
🤖 AI Summary
Reconstructing novel views of dynamic scenes from monocular videos captured by static or slowly moving cameras remains challenging. To address this, we propose the first 3D-aware dynamic Gaussian splatting method integrated with single-image depth priors. Our approach introduces three key innovations: (1) a dynamic initialization strategy that leverages single-frame depth estimation as geometric prior to guide Gaussian parameter generation; (2) joint optimization of deformable Gaussians and an implicit deformation field; and (3) a multi-scale robust depth loss enforcing inter-frame depth consistency. Unlike prior methods, ours does not require rapid camera motion. Evaluated on casually captured videos, it achieves a 2.1 dB PSNR improvement over state-of-the-art dynamic NeRF and dynamic Gaussian splatting methods, and—critically—enables high-fidelity dynamic view synthesis under static-camera capture conditions for the first time.

Technology Category

Application Category

📝 Abstract
In this paper, we propose MoDGS, a new pipeline to render novel views of dy namic scenes from a casually captured monocular video. Previous monocular dynamic NeRF or Gaussian Splatting methods strongly rely on the rapid move ment of input cameras to construct multiview consistency but struggle to recon struct dynamic scenes on casually captured input videos whose cameras are either static or move slowly. To address this challenging task, MoDGS adopts recent single-view depth estimation methods to guide the learning of the dynamic scene. Then, a novel 3D-aware initialization method is proposed to learn a reasonable deformation field and a new robust depth loss is proposed to guide the learning of dynamic scene geometry. Comprehensive experiments demonstrate that MoDGS is able to render high-quality novel view images of dynamic scenes from just a casually captured monocular video, which outperforms state-of-the-art meth ods by a significant margin. The code will be publicly available.
Problem

Research questions and friction points this paper is trying to address.

Rendering dynamic scenes from monocular videos with static/slow cameras
Improving dynamic scene reconstruction using depth estimation
Enhancing novel view synthesis quality for casually captured videos
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses single-view depth estimation for dynamic scenes
Introduces 3D-aware initialization for deformation fields
Proposes robust depth loss for geometry learning
🔎 Similar Papers
No similar papers found.
Q
Qingming Liu
City University of Hong kong, China
Y
Yuan Liu
The University of Hong kong, China
J
Jie-Chao Wang
The University of Hong kong, China
X
Xianqiang Lyv
City University of Hong kong, China
P
Peng Wang
The University of Hong kong, China
Wenping Wang
Wenping Wang
Texas A&M University
Computer GraphicsGeometric Computing
Junhui Hou
Junhui Hou
Department of Computer Science, City University of Hong Kong
Neural Spatial Computing