Zoom-IQA: Image Quality Assessment with Reliable Region-Aware Reasoning

📅 2026-01-06
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge that existing vision-language models struggle to reliably integrate visual and textual cues for image quality assessment, often yielding inaccurate scores and poor interpretability. To overcome this, the authors propose Zoom-IQA, a novel model that emulates human cognitive mechanisms—namely uncertainty awareness, region-based reasoning, and iterative refinement—to achieve both accurate and interpretable evaluations. The approach employs a two-stage training strategy: first, supervised fine-tuning on a newly curated GR-IQA dataset, followed by reinforcement learning enhanced with KL-Coverage regularization and progressive resampling to improve reasoning reliability and generalization. Experiments demonstrate that Zoom-IQA significantly outperforms current methods in robustness, interpretability, and cross-task generalization, with its effectiveness further validated in downstream applications such as image restoration.

Technology Category

Application Category

📝 Abstract
Image Quality Assessment (IQA) is a long-standing problem in computer vision. Previous methods typically focus on predicting numerical scores without explanation or providing low-level descriptions lacking precise scores. Recent reasoning-based vision language models (VLMs) have shown strong potential for IQA by jointly generating quality descriptions and scores. However, existing VLM-based IQA methods often suffer from unreliable reasoning due to their limited capability of integrating visual and textual cues. In this work, we introduce Zoom-IQA, a VLM-based IQA model to explicitly emulate key cognitive behaviors: uncertainty awareness, region reasoning, and iterative refinement. Specifically, we present a two-stage training pipeline: 1) supervised fine-tuning (SFT) on our Grounded-Rationale-IQA (GR-IQA) dataset to teach the model to ground its assessments in key regions, and 2) reinforcement learning (RL) for dynamic policy exploration, stabilized by our KL-Coverage regularizer to prevent reasoning and scoring diversity collapse, with a Progressive Re-sampling Strategy for mitigating annotation bias. Extensive experiments show that Zoom-IQA achieves improved robustness, explainability, and generalization. The application to downstream tasks, such as image restoration, further demonstrates the effectiveness of Zoom-IQA.
Problem

Research questions and friction points this paper is trying to address.

Image Quality Assessment
Vision Language Models
Reliable Reasoning
Region-Aware
Uncertainty Awareness
Innovation

Methods, ideas, or system contributions that make the work stand out.

region-aware reasoning
uncertainty awareness
reinforcement learning
KL-Coverage regularizer
progressive re-sampling
🔎 Similar Papers
No similar papers found.