Test-Time Preference Optimization for Image Restoration

📅 2025-11-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing pre-trained and zero-shot image restoration methods often deviate from human perceptual preferences, resulting in suboptimal perceptual quality. To address this, we propose **Test-time Preference Optimization (TPO)**—a novel paradigm that aligns model outputs with user preferences *during inference*, without model retraining or additional human annotations. TPO operates in three stages: (i) diverse candidate images are generated via diffusion-based inversion; (ii) high-preference samples are selected using automated perceptual metrics or human feedback; and (iii) the denoising process is refined under preference-derived reward signals. TPO is agnostic to the underlying restoration backbone, supports dynamic preference construction, and generalizes across multiple degradation types. Extensive experiments on super-resolution, deblurring, and deraining demonstrate that TPO significantly improves perceptual quality and consistency with human judgments, while incurring zero training overhead.

Technology Category

Application Category

📝 Abstract
Image restoration (IR) models are typically trained to recover high-quality images using L1 or LPIPS loss. To handle diverse unknown degradations, zero-shot IR methods have also been introduced. However, existing pre-trained and zero-shot IR approaches often fail to align with human preferences, resulting in restored images that may not be favored. This highlights the critical need to enhance restoration quality and adapt flexibly to various image restoration tasks or backbones without requiring model retraining and ideally without labor-intensive preference data collection. In this paper, we propose the first Test-Time Preference Optimization (TTPO) paradigm for image restoration, which enhances perceptual quality, generates preference data on-the-fly, and is compatible with any IR model backbone. Specifically, we design a training-free, three-stage pipeline: (i) generate candidate preference images online using diffusion inversion and denoising based on the initially restored image; (ii) select preferred and dispreferred images using automated preference-aligned metrics or human feedback; and (iii) use the selected preference images as reward signals to guide the diffusion denoising process, optimizing the restored image to better align with human preferences. Extensive experiments across various image restoration tasks and models demonstrate the effectiveness and flexibility of the proposed pipeline.
Problem

Research questions and friction points this paper is trying to address.

Optimizing image restoration models to align with human preferences
Eliminating the need for model retraining in restoration tasks
Generating preference-aligned restoration without manual data collection
Innovation

Methods, ideas, or system contributions that make the work stand out.

Test-time optimization enhances perceptual quality automatically
Generates preference data on-the-fly without retraining models
Compatible with any image restoration backbone via diffusion guidance
🔎 Similar Papers
No similar papers found.
Bingchen Li
Bingchen Li
USTC
X
Xin Li
University of Science and Technology of China
J
Jiaqi Xu
Huawei Noah’s Ark Lab
Jiaming Guo
Jiaming Guo
Institute of Computing Technology, Chinese Academy of Sciences
Artificial intelligenceReinforcement Learning
Wenbo Li
Wenbo Li
The Chinese University of Hong Kong
Computer VisionDeep Learning
R
Renjing Pei
Huawei Noah’s Ark Lab
Z
Zhibo Chen
University of Science and Technology of China