🤖 AI Summary
To address insufficient modeling accuracy and temporal instability of 4D Gaussian splatting in dynamic scenes—particularly under large motions, severe occlusions, and fine geometric details—this paper proposes a cascaded temporal residual learning framework. Methodologically, it introduces a novel three-level residual architecture (“video–segment–frame”) for hierarchical disentanglement of dynamic signals; incorporates an optical-flow-driven adaptive temporal segmentation mechanism to enhance robustness to complex motion; and integrates multi-scale residual learning, differentiable rendering, and temporally parameterized Gaussian optimization. Evaluated on multiple standard benchmarks, the method achieves state-of-the-art reconstruction accuracy and visual quality while enabling real-time rendering. Notably, it demonstrates superior performance in challenging scenarios involving large camera or object motion, strong occlusions, and intricate scene details.
📝 Abstract
Recently, Gaussian Splatting methods have emerged as a desirable substitute for prior Radiance Field methods for novel-view synthesis of scenes captured with multi-view images or videos. In this work, we propose a novel extension to 4D Gaussian Splatting for dynamic scenes. Drawing on ideas from residual learning, we hierarchically decompose the dynamic scene into a"video-segment-frame"structure, with segments dynamically adjusted by optical flow. Then, instead of directly predicting the time-dependent signals, we model the signal as the sum of video-constant values, segment-constant values, and frame-specific residuals, as inspired by the success of residual learning. This approach allows more flexible models that adapt to highly variable scenes. We demonstrate state-of-the-art visual quality and real-time rendering on several established datasets, with the greatest improvements on complex scenes with large movements, occlusions, and fine details, where current methods degrade most.