InvFussion: Bridging Supervised and Zero-shot Diffusion for Inverse Problems

📅 2025-04-02

📈 Citations: 0

✨ Influential: 0

career value

193K/year

🤖 AI Summary

To address the challenge of balancing reconstruction quality and generalization in image inverse problems—where supervised learning and zero-shot methods exhibit complementary limitations—this paper proposes a degradation-aware plug-and-play diffusion denoising framework. Methodologically, it integrates differentiable degradation operators into multiple layers of the denoiser and introduces a cross-layer attention-based conditioning mechanism to enable fine-grained fusion of degradation information. Furthermore, it establishes a unified multi-objective probabilistic modeling framework supporting three inference paradigms: minimum mean squared error (MMSE) estimation, posterior sampling, and neural posterior PCA. Evaluated on FFHQ and ImageNet, the method achieves state-of-the-art posterior sampling performance, significantly outperforming both fully supervised and zero-shot baselines. It attains a superior trade-off among accuracy, generalization across degradation types, and computational efficiency.

Technology Category

Application Category

📝 Abstract

Diffusion Models have demonstrated remarkable capabilities in handling inverse problems, offering high-quality posterior-sampling-based solutions. Despite significant advances, a fundamental trade-off persists, regarding the way the conditioned synthesis is employed: Training-based methods achieve high quality results, while zero-shot approaches trade this with flexibility. This work introduces a framework that combines the best of both worlds -- the strong performance of supervised approaches and the flexibility of zero-shot methods. This is achieved through a novel architectural design that seamlessly integrates the degradation operator directly into the denoiser. In each block, our proposed architecture applies the degradation operator on the network activations and conditions the output using the attention mechanism, enabling adaptation to diverse degradation scenarios while maintaining high performance. Our work demonstrates the versatility of the proposed architecture, operating as a general MMSE estimator, a posterior sampler, or a Neural Posterior Principal Component estimator. This flexibility enables a wide range of downstream tasks, highlighting the broad applicability of our framework. The proposed modification of the denoiser network offers a versatile, accurate, and computationally efficient solution, demonstrating the advantages of dedicated network architectures for complex inverse problems. Experimental results on the FFHQ and ImageNet datasets demonstrate state-of-the-art posterior-sampling performance, surpassing both training-based and zero-shot alternatives.

Problem

Research questions and friction points this paper is trying to address.

Combines supervised and zero-shot diffusion for inverse problems

Integrates degradation operator into denoiser for flexibility

Achieves state-of-the-art posterior-sampling performance on datasets

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates degradation operator into denoiser

Uses attention for diverse degradation adaptation

Combines supervised and zero-shot diffusion strengths

🔎 Similar Papers

RevCD - Reversed Conditional Diffusion for Generalized Zero-Shot Learning