🤖 AI Summary
To address the heavy reliance on large-scale real images and high computational resources in image restoration, this paper proposes a lightweight, training-free data adaptation paradigm. Methodologically, it introduces (1) FluxGen—a “generation-as-collection” synthetic data pipeline built upon the pre-trained DiT model Flux—to autonomously generate high-fidelity prior samples; and (2) FluxIR—a low-overhead adapter with squeeze-and-excitation mechanisms—for targeted fine-tuning of the Flux backbone. Crucially, the approach eliminates the need for real-image acquisition and large-scale retraining. It achieves state-of-the-art performance on both synthetic and real-world degraded datasets, attaining superior PSNR/SSIM scores and visual quality. Moreover, its training cost is merely 8.5% of that of the current best method, significantly improving computational efficiency and enhancing privacy preservation.
📝 Abstract
Recently, pre-trained text-to-image (T2I) models have been extensively adopted for real-world image restoration because of their powerful generative prior. However, controlling these large models for image restoration usually requires a large number of high-quality images and immense computational resources for training, which is costly and not privacy-friendly. In this paper, we find that the well-trained large T2I model (i.e., Flux) is able to produce a variety of high-quality images aligned with real-world distributions, offering an unlimited supply of training samples to mitigate the above issue. Specifically, we proposed a training data construction pipeline for image restoration, namely FluxGen, which includes unconditional image generation, image selection, and degraded image simulation. A novel light-weighted adapter (FluxIR) with squeeze-and-excitation layers is also carefully designed to control the large Diffusion Transformer (DiT)-based T2I model so that reasonable details can be restored. Experiments demonstrate that our proposed method enables the Flux model to adapt effectively to real-world image restoration tasks, achieving superior scores and visual quality on both synthetic and real-world degradation datasets - at only about 8.5% of the training cost compared to current approaches.