🤖 AI Summary
Existing dynamic neural rendering methods (e.g., NeRF, 3D Gaussian Splatting) struggle to jointly model large-scale motion and fine-grained dynamic details, leading to geometric distortions, temporal instability, and visual artifacts in synthesized videos of real-world scenes. To address this, we propose an adaptive local implicit feature disentanglement framework: learnable spatiotemporal seeds partition the scene into localized regions; static scene geometry and dynamic residual fields are explicitly decoupled; and the framework integrates local implicit feature decomposition, temporal-aware Gaussian generation, and static-dynamic co-optimization. Our approach is the first to achieve unified, high-fidelity modeling of both large-scale and fine-grained dynamics. It establishes new state-of-the-art performance on multiple fine-grained dynamic benchmarks and successfully scales to complex, large-scale real-world scenes—significantly improving realism, geometric consistency, and temporal stability in dynamic video synthesis.
📝 Abstract
Due to the complex and highly dynamic motions in the real world, synthesizing dynamic videos from multi-view inputs for arbitrary viewpoints is challenging. Previous works based on neural radiance field or 3D Gaussian splatting are limited to modeling fine-scale motion, greatly restricting their application. In this paper, we introduce LocalDyGS, which consists of two parts to adapt our method to both large-scale and fine-scale motion scenes: 1) We decompose a complex dynamic scene into streamlined local spaces defined by seeds, enabling global modeling by capturing motion within each local space. 2) We decouple static and dynamic features for local space motion modeling. A static feature shared across time steps captures static information, while a dynamic residual field provides time-specific features. These are combined and decoded to generate Temporal Gaussians, modeling motion within each local space. As a result, we propose a novel dynamic scene reconstruction framework to model highly dynamic real-world scenes more realistically. Our method not only demonstrates competitive performance on various fine-scale datasets compared to state-of-the-art (SOTA) methods, but also represents the first attempt to model larger and more complex highly dynamic scenes. Project page: https://wujh2001.github.io/LocalDyGS/.