🤖 AI Summary
Dynamic novel view synthesis (NVS) for dynamic scenes suffers from high memory overhead, poor model scalability, inefficient training, and rendering artifacts. To address these challenges, this work proposes the first continual learning framework tailored for dynamic scenes, introducing dynamic neural graphics primitives and a spatiotemporal-decoupled hash encoding fusion mechanism to enable parameter reuse and implicit representation optimization. Furthermore, we establish the first benchmark dataset featuring ultra-long multi-view videos with complex rigid and non-rigid motions. Our method reduces peak training memory by 85% (<14 GB) and compresses streaming bandwidth to <0.4 MB/frame. Quantitative and qualitative evaluations demonstrate substantial improvements in reconstruction fidelity and scalability over state-of-the-art methods.
📝 Abstract
Current methods for novel view synthesis (NVS) in dynamic scenes encounter significant challenges in managing memory consumption, model complexity, training efficiency, and rendering fidelity. Existing offline techniques, while delivering high-quality results, face challenges from substantial memory demands and limited scalability. Conversely, online methods struggle to balance rapid convergence with model compactness. To address these issues, we propose continual dynamic neural graphics primitives (CD-NGP). Our approach leverages a continual learning framework to reduce memory overhead, and it also integrates features from distinct temporal and spatial hash encodings for high rendering quality. Meanwhile, our method employs parameter reuse to achieve high scalability. Additionally, we introduce a novel dataset featuring multi-view, exceptionally long video sequences with substantial rigid and non-rigid motion, which is seldom possessed by existing datasets. We evaluate the reconstruction quality, speed and scalability of our method on both the established public datasets and our exceptionally long video dataset. Notably, our method achieves an $85%$ reduction in training memory consumption (less than 14GB) compared to offline techniques and significantly lowers streaming bandwidth requirements (less than 0.4MB/frame) relative to other online alternatives. The experimental results on our long video sequences dataset show the superior scalability and reconstruction quality compared to existing state-of-the-art approaches.