🤖 AI Summary
Existing methods struggle to achieve spatiotemporally consistent and photorealistic novel view synthesis in dynamic scenes involving high-speed motion and multiple non-rigidly deforming subjects, and they typically lack the ability to render past time instances. This work proposes a neural volume rendering approach that models multi-view synchronized image sequences as a temporally rigidly transformed neural radiance field. By introducing, for the first time within a neural rendering framework, a temporal archival mechanism, the method enables high-quality novel view synthesis and playback from arbitrary historical time steps. Experiments demonstrate that the proposed approach significantly outperforms state-of-the-art techniques on complex dynamic scenarios such as sports events and stage performances, achieving superior spatiotemporal consistency and offering efficient capabilities for retrospective rendering, analysis, and archival.
📝 Abstract
Camera virtualization -- an emerging solution to novel view synthesis -- holds transformative potential for visual entertainment, live performances, and sports broadcasting by enabling the generation of photorealistic images from novel viewpoints using images from a limited set of calibrated multiple static physical cameras. Despite recent advances, achieving spatially and temporally coherent and photorealistic rendering of dynamic scenes with efficient time-archival capabilities, particularly in fast-paced sports and stage performances, remains challenging for existing approaches. Recent methods based on 3D Gaussian Splatting (3DGS) for dynamic scenes could offer real-time view-synthesis results. Yet, they are hindered by their dependence on accurate 3D point clouds from the structure-from-motion method and their inability to handle large, non-rigid, rapid motions of different subjects (e.g., flips, jumps, articulations, sudden player-to-player transitions). Moreover, independent motions of multiple subjects can break the Gaussian-tracking assumptions commonly used in 4DGS, ST-GS, and other dynamic splatting variants. This paper advocates reconsidering a neural volume rendering formulation for camera virtualization and efficient time-archival capabilities, making it useful for sports broadcasting and related applications. By modeling a dynamic scene as rigid transformations across multiple synchronized camera views at a given time, our method performs neural representation learning, providing enhanced visual rendering quality at test time. A key contribution of our approach is its support for time-archival, i.e., users can revisit any past temporal instance of a dynamic scene and can perform novel view synthesis, enabling retrospective rendering for replay, analysis, and archival of live events, a functionality absent in existing neural rendering approaches and novel view synthesis...