Test-Time Preference Optimization for Image Restoration

📅 2025-11-24

📈 Citations: 0

✨ Influential: 0

career value

157K/year

🤖 AI Summary

Existing pre-trained and zero-shot image restoration methods often deviate from human perceptual preferences, resulting in suboptimal perceptual quality. To address this, we propose **Test-time Preference Optimization (TPO)**—a novel paradigm that aligns model outputs with user preferences *during inference*, without model retraining or additional human annotations. TPO operates in three stages: (i) diverse candidate images are generated via diffusion-based inversion; (ii) high-preference samples are selected using automated perceptual metrics or human feedback; and (iii) the denoising process is refined under preference-derived reward signals. TPO is agnostic to the underlying restoration backbone, supports dynamic preference construction, and generalizes across multiple degradation types. Extensive experiments on super-resolution, deblurring, and deraining demonstrate that TPO significantly improves perceptual quality and consistency with human judgments, while incurring zero training overhead.

Technology Category

Application Category

📝 Abstract

Image restoration (IR) models are typically trained to recover high-quality images using L1 or LPIPS loss. To handle diverse unknown degradations, zero-shot IR methods have also been introduced. However, existing pre-trained and zero-shot IR approaches often fail to align with human preferences, resulting in restored images that may not be favored. This highlights the critical need to enhance restoration quality and adapt flexibly to various image restoration tasks or backbones without requiring model retraining and ideally without labor-intensive preference data collection. In this paper, we propose the first Test-Time Preference Optimization (TTPO) paradigm for image restoration, which enhances perceptual quality, generates preference data on-the-fly, and is compatible with any IR model backbone. Specifically, we design a training-free, three-stage pipeline: (i) generate candidate preference images online using diffusion inversion and denoising based on the initially restored image; (ii) select preferred and dispreferred images using automated preference-aligned metrics or human feedback; and (iii) use the selected preference images as reward signals to guide the diffusion denoising process, optimizing the restored image to better align with human preferences. Extensive experiments across various image restoration tasks and models demonstrate the effectiveness and flexibility of the proposed pipeline.

Problem

Research questions and friction points this paper is trying to address.

Optimizing image restoration models to align with human preferences

Eliminating the need for model retraining in restoration tasks

Generating preference-aligned restoration without manual data collection

Innovation

Methods, ideas, or system contributions that make the work stand out.

Test-time optimization enhances perceptual quality automatically

Generates preference data on-the-fly without retraining models

Compatible with any image restoration backbone via diffusion guidance

🔎 Similar Papers

No similar papers found.