🤖 AI Summary
Existing diffusion-based methods for spatiotemporal inverse problems—such as video super-resolution and deblurring—are hindered by the high training cost and poor generalization of dedicated video diffusion models.
Method: We propose a novel fine-tuning-free paradigm that leverages only a pre-trained image diffusion model. Specifically, we map the video’s temporal dimension to the batch dimension and introduce a noise-synchronized batch-consistency sampling strategy to preserve inter-frame temporal coherence. Building upon this, we design an iterative reverse-diffusion framework integrating a Decomposed Diffusion Sampler (DDS), spatiotemporal batch optimization, and noise synchronization constraints.
Contribution/Results: Our approach eliminates the need for costly video diffusion model training while achieving state-of-the-art performance across multiple video inverse tasks. It significantly improves both reconstruction fidelity and spatiotemporal consistency, demonstrating strong generalization without task-specific adaptation.
📝 Abstract
Recently, diffusion model-based inverse problem solvers (DIS) have emerged as state-of-the-art approaches for addressing inverse problems, including image super-resolution, deblurring, inpainting, etc. However, their application to video inverse problems arising from spatio-temporal degradation remains largely unexplored due to the challenges in training video diffusion models. To address this issue, here we introduce an innovative video inverse solver that leverages only image diffusion models. Specifically, by drawing inspiration from the success of the recent decomposed diffusion sampler (DDS), our method treats the time dimension of a video as the batch dimension of image diffusion models and solves spatio-temporal optimization problems within denoised spatio-temporal batches derived from each image diffusion model. Moreover, we introduce a batch-consistent diffusion sampling strategy that encourages consistency across batches by synchronizing the stochastic noise components in image diffusion models. Our approach synergistically combines batch-consistent sampling with simultaneous optimization of denoised spatio-temporal batches at each reverse diffusion step, resulting in a novel and efficient diffusion sampling strategy for video inverse problems. Experimental results demonstrate that our method effectively addresses various spatio-temporal degradations in video inverse problems, achieving state-of-the-art reconstructions. Project page: https://svi-diffusion.github.io/