🤖 AI Summary
To address the challenge of balancing reconstruction quality and generalization in image inverse problems—where supervised learning and zero-shot methods exhibit complementary limitations—this paper proposes a degradation-aware plug-and-play diffusion denoising framework. Methodologically, it integrates differentiable degradation operators into multiple layers of the denoiser and introduces a cross-layer attention-based conditioning mechanism to enable fine-grained fusion of degradation information. Furthermore, it establishes a unified multi-objective probabilistic modeling framework supporting three inference paradigms: minimum mean squared error (MMSE) estimation, posterior sampling, and neural posterior PCA. Evaluated on FFHQ and ImageNet, the method achieves state-of-the-art posterior sampling performance, significantly outperforming both fully supervised and zero-shot baselines. It attains a superior trade-off among accuracy, generalization across degradation types, and computational efficiency.
📝 Abstract
Diffusion Models have demonstrated remarkable capabilities in handling inverse problems, offering high-quality posterior-sampling-based solutions. Despite significant advances, a fundamental trade-off persists, regarding the way the conditioned synthesis is employed: Training-based methods achieve high quality results, while zero-shot approaches trade this with flexibility. This work introduces a framework that combines the best of both worlds -- the strong performance of supervised approaches and the flexibility of zero-shot methods. This is achieved through a novel architectural design that seamlessly integrates the degradation operator directly into the denoiser. In each block, our proposed architecture applies the degradation operator on the network activations and conditions the output using the attention mechanism, enabling adaptation to diverse degradation scenarios while maintaining high performance. Our work demonstrates the versatility of the proposed architecture, operating as a general MMSE estimator, a posterior sampler, or a Neural Posterior Principal Component estimator. This flexibility enables a wide range of downstream tasks, highlighting the broad applicability of our framework. The proposed modification of the denoiser network offers a versatile, accurate, and computationally efficient solution, demonstrating the advantages of dedicated network architectures for complex inverse problems. Experimental results on the FFHQ and ImageNet datasets demonstrate state-of-the-art posterior-sampling performance, surpassing both training-based and zero-shot alternatives.