🤖 AI Summary
Reconstructing high-fidelity 4D models of fast-moving scenes from low-frame-rate multi-view video remains challenging due to insufficient temporal sampling.
Method: We propose an asynchronous multi-view acquisition and generative inpainting framework. First, we design an asynchronous capture strategy—temporally staggering multiple 25 FPS cameras—to effectively achieve 100–200 FPS temporal sampling. Second, we introduce a novel video diffusion-based artifact removal network that jointly optimizes geometric completeness and spatiotemporal consistency under sparse-view conditions. Third, we integrate 4D sparse reconstruction, temporal alignment, and interpolation into an end-to-end pipeline for high-fidelity dynamic 4D reconstruction.
Results: Experiments demonstrate that our method significantly outperforms conventional synchronous acquisition—without requiring expensive high-speed cameras—yielding smoother motion trajectories, richer geometric detail, and more accurate dynamic structure recovery.
📝 Abstract
Reconstructing fast-dynamic scenes from multi-view videos is crucial for high-speed motion analysis and realistic 4D reconstruction. However, the majority of 4D capture systems are limited to frame rates below 30 FPS (frames per second), and a direct 4D reconstruction of high-speed motion from low FPS input may lead to undesirable results. In this work, we propose a high-speed 4D capturing system only using low FPS cameras, through novel capturing and processing modules. On the capturing side, we propose an asynchronous capture scheme that increases the effective frame rate by staggering the start times of cameras. By grouping cameras and leveraging a base frame rate of 25 FPS, our method achieves an equivalent frame rate of 100-200 FPS without requiring specialized high-speed cameras. On processing side, we also propose a novel generative model to fix artifacts caused by 4D sparse-view reconstruction, as asynchrony reduces the number of viewpoints at each timestamp. Specifically, we propose to train a video-diffusion-based artifact-fix model for sparse 4D reconstruction, which refines missing details, maintains temporal consistency, and improves overall reconstruction quality. Experimental results demonstrate that our method significantly enhances high-speed 4D reconstruction compared to synchronous capture.