🤖 AI Summary
Deep image inpainting must jointly address degradation removal, detail realism, and pixel-level consistency—challenges inadequately resolved by existing MSE-based, GAN-based, or diffusion-based methods, which struggle to balance quality, fidelity, and efficiency. This paper introduces HYPIR: the first approach to initialize the generator with a pretrained diffusion model and integrate adversarial training with score-based prior guidance within an end-to-end single-step forward inpainting framework. By leveraging pretrained diffusion priors without diffusion loss or iterative sampling, HYPIR avoids mode collapse, accelerates convergence, and enhances training stability. It natively supports text-guided and texture-controlled inpainting. Extensive experiments demonstrate that HYPIR consistently outperforms state-of-the-art methods across diverse inpainting tasks, achieving superior trade-offs among reconstruction quality, structural fidelity, and inference speed (single forward pass), while also significantly improving training efficiency.
📝 Abstract
Deep image restoration models aim to learn a mapping from degraded image space to natural image space. However, they face several critical challenges: removing degradation, generating realistic details, and ensuring pixel-level consistency. Over time, three major classes of methods have emerged, including MSE-based, GAN-based, and diffusion-based methods. However, they fail to achieve a good balance between restoration quality, fidelity, and speed. We propose a novel method, HYPIR, to address these challenges. Our solution pipeline is straightforward: it involves initializing the image restoration model with a pre-trained diffusion model and then fine-tuning it with adversarial training. This approach does not rely on diffusion loss, iterative sampling, or additional adapters. We theoretically demonstrate that initializing adversarial training from a pre-trained diffusion model positions the initial restoration model very close to the natural image distribution. Consequently, this initialization improves numerical stability, avoids mode collapse, and substantially accelerates the convergence of adversarial training. Moreover, HYPIR inherits the capabilities of diffusion models with rich user control, enabling text-guided restoration and adjustable texture richness. Requiring only a single forward pass, it achieves faster convergence and inference speed than diffusion-based methods. Extensive experiments show that HYPIR outperforms previous state-of-the-art methods, achieving efficient and high-quality image restoration.