๐ค AI Summary
To address the ill-posedness of monocular dynamic smoke 3D reconstruction caused by severe viewpoint ambiguity, this paper proposes a physics-guided diffusion model framework. Methodologically, it integrates diffusion priors with differentiable NavierโStokes modeling, incorporating divergence and curl constraints on the velocity field, explicit advection optimization, and progressive multi-view rendering to jointly optimize the density field, velocity field, and smoke source location from a single video input. Its key innovations include: (i) the first incorporation of physical constraints directly into the diffusion generation process, enabling end-to-end co-optimization of generative priors and fluid dynamics; and (ii) alleviating monocular information deficiency via differentiable view synthesis and iterative density refinement. On standard benchmarks, our method surpasses state-of-the-art approaches in density-field PSNR, velocity-field angular error, and smoke-source localization accuracy; qualitative results further demonstrate superior spatiotemporal coherence and realism.
๐ Abstract
Reconstructing dynamic fluids from sparse views is a long-standing and challenging problem, due to the severe lack of 3D information from insufficient view coverage. While several pioneering approaches have attempted to address this issue using differentiable rendering or novel view synthesis, they are often limited by time-consuming optimization and refinement processes under ill-posed conditions. To tackle above challenges, we propose SmokeSVD, an efficient and effective framework to progressively generate and reconstruct dynamic smoke from a single video by integrating both the powerful generative capabilities from diffusion models and physically guided consistency optimization towards realistic appearance and dynamic evolution. Specifically, we first propose a physically guided side-view synthesizer based on diffusion models, which explicitly incorporates divergence and gradient guidance of velocity fields to generate visually realistic and spatio-temporally consistent side-view images frame by frame, significantly alleviating the ill-posedness of single-view reconstruction without imposing additional constraints. Subsequently, we determine a rough estimation of density field from the pair of front-view input and side-view synthetic image, and further refine 2D blurry novel-view images and 3D coarse-grained density field through an iterative process that progressively renders and enhances the images from increasing novel viewing angles, generating high-quality multi-view image sequences. Finally, we reconstruct and estimate the fine-grained density field, velocity field, and smoke source via differentiable advection by leveraging the Navier-Stokes equations. Extensive quantitative and qualitative experiments show that our approach achieves high-quality reconstruction and outperforms previous state-of-the-art techniques.