Zoom-IQA: Image Quality Assessment with Reliable Region-Aware Reasoning

📅 2026-01-06

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

203K/year

🤖 AI Summary

This work addresses the challenge that existing vision-language models struggle to reliably integrate visual and textual cues for image quality assessment, often yielding inaccurate scores and poor interpretability. To overcome this, the authors propose Zoom-IQA, a novel model that emulates human cognitive mechanisms—namely uncertainty awareness, region-based reasoning, and iterative refinement—to achieve both accurate and interpretable evaluations. The approach employs a two-stage training strategy: first, supervised fine-tuning on a newly curated GR-IQA dataset, followed by reinforcement learning enhanced with KL-Coverage regularization and progressive resampling to improve reasoning reliability and generalization. Experiments demonstrate that Zoom-IQA significantly outperforms current methods in robustness, interpretability, and cross-task generalization, with its effectiveness further validated in downstream applications such as image restoration.

Technology Category

Application Category

📝 Abstract

Image Quality Assessment (IQA) is a long-standing problem in computer vision. Previous methods typically focus on predicting numerical scores without explanation or providing low-level descriptions lacking precise scores. Recent reasoning-based vision language models (VLMs) have shown strong potential for IQA by jointly generating quality descriptions and scores. However, existing VLM-based IQA methods often suffer from unreliable reasoning due to their limited capability of integrating visual and textual cues. In this work, we introduce Zoom-IQA, a VLM-based IQA model to explicitly emulate key cognitive behaviors: uncertainty awareness, region reasoning, and iterative refinement. Specifically, we present a two-stage training pipeline: 1) supervised fine-tuning (SFT) on our Grounded-Rationale-IQA (GR-IQA) dataset to teach the model to ground its assessments in key regions, and 2) reinforcement learning (RL) for dynamic policy exploration, stabilized by our KL-Coverage regularizer to prevent reasoning and scoring diversity collapse, with a Progressive Re-sampling Strategy for mitigating annotation bias. Extensive experiments show that Zoom-IQA achieves improved robustness, explainability, and generalization. The application to downstream tasks, such as image restoration, further demonstrates the effectiveness of Zoom-IQA.

Problem

Research questions and friction points this paper is trying to address.

Image Quality Assessment

Vision Language Models

Reliable Reasoning

Region-Aware

Uncertainty Awareness

Innovation

Methods, ideas, or system contributions that make the work stand out.

region-aware reasoning

uncertainty awareness

reinforcement learning