Acquire and then Adapt: Squeezing out Text-to-Image Model for Image Restoration

📅 2025-04-21

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

To address the heavy reliance on large-scale real images and high computational resources in image restoration, this paper proposes a lightweight, training-free data adaptation paradigm. Methodologically, it introduces (1) FluxGen—a “generation-as-collection” synthetic data pipeline built upon the pre-trained DiT model Flux—to autonomously generate high-fidelity prior samples; and (2) FluxIR—a low-overhead adapter with squeeze-and-excitation mechanisms—for targeted fine-tuning of the Flux backbone. Crucially, the approach eliminates the need for real-image acquisition and large-scale retraining. It achieves state-of-the-art performance on both synthetic and real-world degraded datasets, attaining superior PSNR/SSIM scores and visual quality. Moreover, its training cost is merely 8.5% of that of the current best method, significantly improving computational efficiency and enhancing privacy preservation.

Technology Category

Application Category

📝 Abstract

Recently, pre-trained text-to-image (T2I) models have been extensively adopted for real-world image restoration because of their powerful generative prior. However, controlling these large models for image restoration usually requires a large number of high-quality images and immense computational resources for training, which is costly and not privacy-friendly. In this paper, we find that the well-trained large T2I model (i.e., Flux) is able to produce a variety of high-quality images aligned with real-world distributions, offering an unlimited supply of training samples to mitigate the above issue. Specifically, we proposed a training data construction pipeline for image restoration, namely FluxGen, which includes unconditional image generation, image selection, and degraded image simulation. A novel light-weighted adapter (FluxIR) with squeeze-and-excitation layers is also carefully designed to control the large Diffusion Transformer (DiT)-based T2I model so that reasonable details can be restored. Experiments demonstrate that our proposed method enables the Flux model to adapt effectively to real-world image restoration tasks, achieving superior scores and visual quality on both synthetic and real-world degradation datasets - at only about 8.5% of the training cost compared to current approaches.

Problem

Research questions and friction points this paper is trying to address.

Control large text-to-image models for efficient image restoration

Reduce training costs and resource demands for image restoration

Generate high-quality training samples from pre-trained T2I models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Utilizes FluxGen pipeline for training data construction

Introduces FluxIR adapter for controlling T2I model

Achieves high-quality restoration with reduced training cost

🔎 Similar Papers

No similar papers found.