๐ค AI Summary
Spatially varying image deblurring is highly ill-posed under complex, spatially varying motion blur coupled with noise. Existing methods face a fundamental trade-off: model-based deep unfolding approaches enforce strong physical constraints but suffer from oversmoothing and artifacts, while generative models achieve superior perceptual quality yet often hallucinate unrealistic details. To bridge this gap, we propose a physics-guided diffusion-based deblurring framework. First, we estimate a dense, high-dimensional compressed representation of the spatially varying blur kernel fieldโserving as a rigorous, physically grounded prior. Second, we condition a diffusion model on this estimated degradation field and employ ControlNet to steer the denoising sampling process, thereby enforcing strict physical consistency while enhancing texture fidelity. Extensive experiments demonstrate that our method significantly outperforms state-of-the-art approaches on severely blurred images, effectively mitigating both oversmoothing and hallucination, and achieving a unified balance between physical accuracy and visual realism.
๐ Abstract
Spatially varying image deblurring remains a fundamentally ill-posed problem, especially when degradations arise from complex mixtures of motion and other forms of blur under significant noise. State-of-the-art learning-based approaches generally fall into two paradigms: model-based deep unrolling methods that enforce physical constraints by modeling the degradations, but often produce over-smoothed, artifact-laden textures, and generative models that achieve superior perceptual quality yet hallucinate details due to weak physical constraints. In this paper, we propose a novel framework that uniquely reconciles these paradigms by taming a powerful generative prior with explicit, dense physical constraints. Rather than oversimplifying the degradation field, we model it as a dense continuum of high-dimensional compressed kernels, ensuring that minute variations in motion and other degradation patterns are captured. We leverage this rich descriptor field to condition a ControlNet architecture, strongly guiding the diffusion sampling process. Extensive experiments demonstrate that our method effectively bridges the gap between physical accuracy and perceptual realism, outperforming state-of-the-art model-based methods as well as generative baselines in challenging, severely blurred scenarios.