Acquire and then Adapt: Squeezing out Text-to-Image Model for Image Restoration

📅 2025-04-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the heavy reliance on large-scale real images and high computational resources in image restoration, this paper proposes a lightweight, training-free data adaptation paradigm. Methodologically, it introduces (1) FluxGen—a “generation-as-collection” synthetic data pipeline built upon the pre-trained DiT model Flux—to autonomously generate high-fidelity prior samples; and (2) FluxIR—a low-overhead adapter with squeeze-and-excitation mechanisms—for targeted fine-tuning of the Flux backbone. Crucially, the approach eliminates the need for real-image acquisition and large-scale retraining. It achieves state-of-the-art performance on both synthetic and real-world degraded datasets, attaining superior PSNR/SSIM scores and visual quality. Moreover, its training cost is merely 8.5% of that of the current best method, significantly improving computational efficiency and enhancing privacy preservation.

Technology Category

Application Category

📝 Abstract
Recently, pre-trained text-to-image (T2I) models have been extensively adopted for real-world image restoration because of their powerful generative prior. However, controlling these large models for image restoration usually requires a large number of high-quality images and immense computational resources for training, which is costly and not privacy-friendly. In this paper, we find that the well-trained large T2I model (i.e., Flux) is able to produce a variety of high-quality images aligned with real-world distributions, offering an unlimited supply of training samples to mitigate the above issue. Specifically, we proposed a training data construction pipeline for image restoration, namely FluxGen, which includes unconditional image generation, image selection, and degraded image simulation. A novel light-weighted adapter (FluxIR) with squeeze-and-excitation layers is also carefully designed to control the large Diffusion Transformer (DiT)-based T2I model so that reasonable details can be restored. Experiments demonstrate that our proposed method enables the Flux model to adapt effectively to real-world image restoration tasks, achieving superior scores and visual quality on both synthetic and real-world degradation datasets - at only about 8.5% of the training cost compared to current approaches.
Problem

Research questions and friction points this paper is trying to address.

Control large text-to-image models for efficient image restoration
Reduce training costs and resource demands for image restoration
Generate high-quality training samples from pre-trained T2I models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Utilizes FluxGen pipeline for training data construction
Introduces FluxIR adapter for controlling T2I model
Achieves high-quality restoration with reduced training cost
🔎 Similar Papers
No similar papers found.
Junyuan Deng
Junyuan Deng
HKUST
SLAMCVsfmdiffusion
X
Xinyi Wu
Honor Device Co., Ltd
Y
Yongxing Yang
Honor Device Co., Ltd
C
Congchao Zhu
Honor Device Co., Ltd
S
Song Wang
Honor Device Co., Ltd, Shenzhen University of Advanced Technology
Zhenyao Wu
Zhenyao Wu
Honor Device Co., Ltd.; University of South Carolina
Computer visionDeep learningImage Processing