ME-IQA: Memory-Enhanced Image Quality Assessment via Re-Ranking

📅 2026-03-21

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

This work addresses the limitation of existing reasoning-driven vision-language models (VLMs) in image quality assessment (IQA), which often suffer from discrete collapse due to reliance on scalar scores and struggle to capture fine-grained distortion differences. To overcome this, the authors propose a plug-and-play, test-time memory-augmented reranking framework. It dynamically maintains a memory bank of reference samples and retrieves semantically and perceptually aligned neighbors using reasoning summaries. The VLM is reformulated as a probabilistic comparator grounded in the Thurstone Case V psychometric model, integrating ordinal pairwise preference evidence to refine initial quality predictions. This approach, the first to incorporate test-time memory mechanisms with probabilistic preference modeling, significantly outperforms current VLMs, traditional IQA methods, and test-time scaling strategies across multiple benchmarks, yielding denser and more distortion-sensitive quality estimates.

Technology Category

Application Category

📝 Abstract

Reasoning-induced vision-language models (VLMs) advance image quality assessment (IQA) with textual reasoning, yet their scalar scores often lack sensitivity and collapse to a few values, so-called discrete collapse. We introduce ME-IQA, a plug-and-play, test-time memory-enhanced re-ranking framework. It (i) builds a memory bank and retrieves semantically and perceptually aligned neighbors using reasoning summaries, (ii) reframes the VLM as a probabilistic comparator to obtain pairwise preference probabilities and fuse this ordinal evidence with the initial score under Thurstone's Case V model, and (iii) performs gated reflection and consolidates memory to improve future decisions. This yields denser, distortion-sensitive predictions and mitigates discrete collapse. Experiments across multiple IQA benchmarks show consistent gains over strong reasoning-induced VLM baselines, existing non-reasoning IQA methods, and test-time scaling alternatives.

Problem

Research questions and friction points this paper is trying to address.

image quality assessment

discrete collapse

vision-language models

scalar scores

sensitivity

Innovation

Methods, ideas, or system contributions that make the work stand out.

memory-enhanced

re-ranking

vision-language model