🤖 AI Summary
Existing 4D Gaussian modeling methods for long-duration dynamic videos suffer from three key challenges: memory explosion, temporal flickering, and reconstruction artifacts caused by dynamic occlusions or object disappearance. This paper proposes an anchor-relay bidirectional deformation fusion mechanism coupled with a feature-variance-guided hierarchical densification strategy, enabling spatiotemporally consistent, memory-bounded, high-fidelity reconstruction within the 4D Gaussian splatting framework. Our core contributions are: (1) learnable bidirectional deformation modeling driven by an anchor space, mitigating geometric drift under long-range motion; and (2) adaptive opacity blending and hierarchical density optimization to suppress temporal flickering and enhance occlusion robustness. Evaluated on our newly constructed long-range dataset SelfCap$_{ ext{LR}}$, our method achieves significant improvements over state-of-the-art approaches—delivering flicker-free, memory-efficient, and high-quality reconstructions.
📝 Abstract
Recent advances in 4D Gaussian Splatting (4DGS) have extended the high-speed rendering capability of 3D Gaussian Splatting (3DGS) into the temporal domain, enabling real-time rendering of dynamic scenes. However, one of the major remaining challenges lies in modeling long-range motion-contained dynamic videos, where a naive extension of existing methods leads to severe memory explosion, temporal flickering, and failure to handle appearing or disappearing occlusions over time. To address these challenges, we propose a novel 4DGS framework characterized by an Anchor Relay-based Bidirectional Blending (ARBB) mechanism, named MoRel, which enables temporally consistent and memory-efficient modeling of long-range dynamic scenes. Our method progressively constructs locally canonical anchor spaces at key-frame time index and models inter-frame deformations at the anchor level, enhancing temporal coherence. By learning bidirectional deformations between KfA and adaptively blending them through learnable opacity control, our approach mitigates temporal discontinuities and flickering artifacts. We further introduce a Feature-variance-guided Hierarchical Densification (FHD) scheme that effectively densifies KfA's while keeping rendering quality, based on an assigned level of feature-variance. To effectively evaluate our model's capability to handle real-world long-range 4D motion, we newly compose long-range 4D motion-contained dataset, called SelfCap$_{ ext{LR}}$. It has larger average dynamic motion magnitude, captured at spatially wider spaces, compared to previous dynamic video datasets. Overall, our MoRel achieves temporally coherent and flicker-free long-range 4D reconstruction while maintaining bounded memory usage, demonstrating both scalability and efficiency in dynamic Gaussian-based representations.