Splatography: Sparse multi-view dynamic Gaussian Splatting for filmmaking challenges

📅 2025-11-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the limitation of dynamic 3D reconstruction performance caused by sparse camera configurations in film production, this paper proposes a foreground-background decoupled deformable Gaussian splatting framework. It is the first to semantically separate Gaussian splatting from the deformation field, enabling precise reconstruction of transparent objects and dynamic textures via sparse mask guidance—without requiring dense mask supervision. A staged loss function facilitates normative pretraining followed by dynamic fine-tuning, jointly optimizing dynamic parameters including color, position, and orientation. Evaluated on 3D and 2.5D cinematic datasets, our method achieves state-of-the-art performance: +3 dB PSNR improvement, 50% reduction in model size, and simultaneous generation of high-fidelity dynamic reconstructions with semantic segmentation.

Technology Category

Application Category

📝 Abstract
Deformable Gaussian Splatting (GS) accomplishes photorealistic dynamic 3-D reconstruction from dense multi-view video (MVV) by learning to deform a canonical GS representation. However, in filmmaking, tight budgets can result in sparse camera configurations, which limits state-of-the-art (SotA) methods when capturing complex dynamic features. To address this issue, we introduce an approach that splits the canonical Gaussians and deformation field into foreground and background components using a sparse set of masks for frames at t=0. Each representation is separately trained on different loss functions during canonical pre-training. Then, during dynamic training, different parameters are modeled for each deformation field following common filmmaking practices. The foreground stage contains diverse dynamic features so changes in color, position and rotation are learned. While, the background containing film-crew and equipment, is typically dimmer and less dynamic so only changes in point position are learned. Experiments on 3-D and 2.5-D entertainment datasets show that our method produces SotA qualitative and quantitative results; up to 3 PSNR higher with half the model size on 3-D scenes. Unlike the SotA and without the need for dense mask supervision, our method also produces segmented dynamic reconstructions including transparent and dynamic textures. Code and video comparisons are available online: https://interims-git.github.io/
Problem

Research questions and friction points this paper is trying to address.

Addresses sparse camera setups in filmmaking limiting dynamic 3D reconstruction
Solves foreground-background separation in dynamic Gaussian Splatting without dense masks
Improves reconstruction of complex dynamic features with limited camera views
Innovation

Methods, ideas, or system contributions that make the work stand out.

Splits Gaussians into foreground and background components
Separately trains representations with different loss functions
Models distinct deformation parameters for dynamic features
🔎 Similar Papers
No similar papers found.
Adrian Azzarelli
Adrian Azzarelli
University of Bristol
video3-D capturecinematographyai
N
N. Anantrasirichai
Bristol Visual Institute, University of Bristol, UK
D
D. Bull
Bristol Visual Institute, University of Bristol, UK