UARE: A Unified Vision-Language Model for Image Quality Assessment, Restoration, and Enhancement

📅 2025-12-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the long-standing disconnection between image quality assessment (IQA) and image restoration by proposing the first unified vision-language multimodal framework that jointly optimizes IQA, restoration, and enhancement. Methodologically, it employs a two-stage progressive training strategy: first learning from mild to severe degradations, then jointly fine-tuning quality understanding and restoration generation via interleaved image-text data. Its key innovation lies in explicitly embedding quality awareness into the restoration process—revealing how IQA guidance mechanistically improves generative performance. Extensive experiments on multiple IQA and image restoration benchmarks demonstrate significant improvements over single-task models. Restored images achieve superior objective metrics and higher human perceptual consistency, validating the effectiveness and generalizability of cross-task collaborative modeling.

Technology Category

Application Category

📝 Abstract
Image quality assessment (IQA) and image restoration are fundamental problems in low-level vision. Although IQA and restoration are closely connected conceptually, most existing work treats them in isolation. Recent advances in unified multimodal understanding-generation models demonstrate promising results and indicate that stronger understanding can improve generative performance. This motivates a single model that unifies IQA and restoration and explicitly studies how IQA can guide restoration, a setting that remains largely underexplored yet highly valuable. In this paper, we propose UARE, to our knowledge the first Unified vision-language model for image quality Assessment, Restoration, and Enhancement. Built on pretrained unified understanding and generation models, we introduce a two-stage training framework. First, a progressive, easy-to-hard schedule expands from single-type distortions to higher-order mixed degradations, enabling UARE to handle multiple degradations. Second, we perform unified fine-tuning of quality understanding and restoration with interleaved text-image data, aligning IQA signals with restoration objectives. Through multi-task co-training, UARE leverages IQA to boost restoration and enhancement performance. Extensive experiments across IQA, restoration, and enhancement tasks demonstrate the effectiveness of UARE. The code and models will be available at https://github.com/lwq20020127/UARE.
Problem

Research questions and friction points this paper is trying to address.

Unifies image quality assessment with restoration and enhancement tasks
Studies how quality assessment can guide image restoration processes
Handles multiple degradations through progressive training on mixed distortions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified vision-language model for assessment, restoration, enhancement
Two-stage training with progressive distortion schedule
Multi-task co-training aligning IQA signals with restoration
🔎 Similar Papers
No similar papers found.
W
Weiqi Li
School of Electronic and Computer Engineering, Peking University
X
Xuanyu Zhang
School of Electronic and Computer Engineering, Peking University, ByteDance Inc.
B
Bin Chen
School of Electronic and Computer Engineering, Peking University, ByteDance Inc.
J
Jingfen Xie
ByteDance Inc.
Y
Yan Wang
ByteDance Inc.
Kexin Zhang
Kexin Zhang
Tsinghua University
Data MiningMachine Learning
Junlin Li
Junlin Li
ByteDance Inc. - Georgia Institute of Technology - Tsinghua University
Video Compression and ProcessingVideo StreamingMachine LearningAIASIC Design
L
Li Zhang
ByteDance Inc.
J
Jian Zhang
School of Electronic and Computer Engineering, Peking University
S
Shijie Zhao
ByteDance Inc.