🤖 AI Summary
Existing pre-trained and zero-shot image restoration methods often deviate from human perceptual preferences, resulting in suboptimal perceptual quality. To address this, we propose **Test-time Preference Optimization (TPO)**—a novel paradigm that aligns model outputs with user preferences *during inference*, without model retraining or additional human annotations. TPO operates in three stages: (i) diverse candidate images are generated via diffusion-based inversion; (ii) high-preference samples are selected using automated perceptual metrics or human feedback; and (iii) the denoising process is refined under preference-derived reward signals. TPO is agnostic to the underlying restoration backbone, supports dynamic preference construction, and generalizes across multiple degradation types. Extensive experiments on super-resolution, deblurring, and deraining demonstrate that TPO significantly improves perceptual quality and consistency with human judgments, while incurring zero training overhead.
📝 Abstract
Image restoration (IR) models are typically trained to recover high-quality images using L1 or LPIPS loss. To handle diverse unknown degradations, zero-shot IR methods have also been introduced. However, existing pre-trained and zero-shot IR approaches often fail to align with human preferences, resulting in restored images that may not be favored. This highlights the critical need to enhance restoration quality and adapt flexibly to various image restoration tasks or backbones without requiring model retraining and ideally without labor-intensive preference data collection. In this paper, we propose the first Test-Time Preference Optimization (TTPO) paradigm for image restoration, which enhances perceptual quality, generates preference data on-the-fly, and is compatible with any IR model backbone. Specifically, we design a training-free, three-stage pipeline: (i) generate candidate preference images online using diffusion inversion and denoising based on the initially restored image; (ii) select preferred and dispreferred images using automated preference-aligned metrics or human feedback; and (iii) use the selected preference images as reward signals to guide the diffusion denoising process, optimizing the restored image to better align with human preferences. Extensive experiments across various image restoration tasks and models demonstrate the effectiveness and flexibility of the proposed pipeline.