Feed-Forward Bullet-Time Reconstruction of Dynamic Scenes from Monocular Videos

๐Ÿ“… 2024-12-04
๐Ÿ›๏ธ arXiv.org
๐Ÿ“ˆ Citations: 2
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Existing static feed-forward scene reconstruction methods suffer from poor generalization and fail to model dynamic content effectively. To address this, we propose the first motion-aware feed-forward framework for dynamic scene reconstruction, enabling real-time bullet-time rendering and novel-view synthesis from monocular video input. Our approach employs a 3D Gaussian splatting representation integrated with a cross-frame spatiotemporal aggregation mechanism, jointly modeling static backgrounds and dynamic foregrounds without iterative optimization. The model processes monocular video end-to-end and reconstructs the entire scene within 150 msโ€”significantly outperforming optimization-based methods in speed. It achieves state-of-the-art performance on both static and dynamic benchmarks, delivering strong generalization, high-fidelity reconstruction, and millisecond-level inference latency.

Technology Category

Application Category

๐Ÿ“ Abstract
Recent advancements in static feed-forward scene reconstruction have demonstrated significant progress in high-quality novel view synthesis. However, these models often struggle with generalizability across diverse environments and fail to effectively handle dynamic content. We present BTimer (short for BulletTimer), the first motion-aware feed-forward model for real-time reconstruction and novel view synthesis of dynamic scenes. Our approach reconstructs the full scene in a 3D Gaussian Splatting representation at a given target ('bullet') timestamp by aggregating information from all the context frames. Such a formulation allows BTimer to gain scalability and generalization by leveraging both static and dynamic scene datasets. Given a casual monocular dynamic video, BTimer reconstructs a bullet-time scene within 150ms while reaching state-of-the-art performance on both static and dynamic scene datasets, even compared with optimization-based approaches.
Problem

Research questions and friction points this paper is trying to address.

Reconstructs dynamic scenes from monocular videos in real-time
Improves generalization across diverse static and dynamic environments
Enables high-quality novel view synthesis using 3D Gaussian Splatting
Innovation

Methods, ideas, or system contributions that make the work stand out.

Motion-aware feed-forward model for dynamic scenes
3D Gaussian Splatting representation for reconstruction
Real-time bullet-time scene reconstruction in 150ms
๐Ÿ”Ž Similar Papers
No similar papers found.