🤖 AI Summary
This work addresses the joint restoration of multiple degraded images of the same scene—e.g., suffering from motion blur and low resolution—by proposing the first multi-view diffusion model tailored for sparse image collections. Methodologically, it introduces multi-view geometric consistency into the diffusion framework for the first time, incorporating a cross-image feature alignment module and a 3D consistency regularization loss, while leveraging implicit scene priors to enable geometry-aware joint deblurring and super-resolution. Compared to single-image or video-based approaches, our model achieves superior reconstruction quality and inter-view consistency, attaining state-of-the-art performance across multiple benchmarks. Moreover, the generated outputs effectively support downstream 3D reconstruction and camera pose estimation, demonstrating the efficacy and generalization capability of multi-view diffusion modeling for ill-posed inverse problems.
📝 Abstract
The computer vision community has developed numerous techniques for digitally restoring true scene information from single-view degraded photographs, an important yet extremely ill-posed task. In this work, we tackle image restoration from a different perspective by jointly denoising multiple photographs of the same scene. Our core hypothesis is that degraded images capturing a shared scene contain complementary information that, when combined, better constrains the restoration problem. To this end, we implement a powerful multi-view diffusion model that jointly generates uncorrupted views by extracting rich information from multi-view relationships. Our experiments show that our multi-view approach outperforms existing single-view image and even video-based methods on image deblurring and super-resolution tasks. Critically, our model is trained to output 3D consistent images, making it a promising tool for applications requiring robust multi-view integration, such as 3D reconstruction or pose estimation.