Your Pre-trained Diffusion Model Secretly Knows Restoration

πŸ“… 2026-04-06
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the challenge of unlocking the inherent image and video restoration capabilities of pre-trained diffusion models without fine-tuning or introducing additional control modules. The authors propose a lightweight prompt learning strategy that directly optimizes learnable prompt embeddings at the output of the text encoder, coupled with a diffusion bridge mechanism to align training and inference dynamics. This approach effectively activates the model’s latent restoration behavior while leaving the backbone architecture unchanged. It is readily applicable to both pre-trained WAN video models and FLUX image models, demonstrating superior restoration performance and strong generalization across diverse degradation types, significantly outperforming existing baselines based on textual prompting or embedding optimization.
πŸ“ Abstract
Pre-trained diffusion models have enabled significant advancements in All-in-One Restoration (AiOR), offering improved perceptual quality and generalization. However, diffusion-based restoration methods primarily rely on fine-tuning or Control-Net style modules to leverage the pre-trained diffusion model's priors for AiOR. In this work, we show that these pre-trained diffusion models inherently possess restoration behavior, which can be unlocked by directly learning prompt embeddings at the output of the text encoder. Interestingly, this behavior is largely inaccessible through text prompts and text-token embedding optimization. Furthermore, we observe that naive prompt learning is unstable because the forward noising process using degraded images is misaligned with the reverse sampling trajectory. To resolve this, we train prompts within a diffusion bridge formulation that aligns training and inference dynamics, enforcing a coherent denoising path from noisy degraded states to clean images. Building on these insights, we introduce our lightweight learned prompts on the pre-trained WAN video model and FLUX image models, converting them into high-performing restoration models. Extensive experiments demonstrate that our approach achieves competitive performance and generalization across diverse degradations, while avoiding fine-tuning and restoration-specific control modules.
Problem

Research questions and friction points this paper is trying to address.

diffusion model
image restoration
pre-trained model
prompt learning
all-in-one restoration
Innovation

Methods, ideas, or system contributions that make the work stand out.

pre-trained diffusion models
prompt embedding learning
diffusion bridge
All-in-One Restoration
zero-shot restoration
πŸ”Ž Similar Papers
2024-07-04IEEE transactions on circuits and systems for video technology (Print)Citations: 1