Understanding Pure Textual Reasoning for Blind Image Quality Assessment

📅 2026-01-05

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

176K/year

🤖 AI Summary

This study investigates the practical contribution of textual information in no-reference blind image quality assessment (BIQA) and its capacity to represent image content. To this end, the authors propose three reasoning paradigms—Chain-of-Thought, Self-Consistency, and Autoencoder—to systematically analyze the information flow among images, text, and quality scores. Their analysis reveals, for the first time, the inherent limitations of purely text-based reasoning in BIQA. Notably, the Self-Consistency paradigm substantially narrows the performance gap between image-conditioned and text-conditioned predictions, reducing the differences in PLCC and SRCC metrics to 0.02 and 0.03, respectively, thereby outperforming existing approaches.

Technology Category

Application Category

📝 Abstract

Textual reasoning has recently been widely adopted in Blind Image Quality Assessment (BIQA). However, it remains unclear how textual information contributes to quality prediction and to what extent text can represent the score-related image contents. This work addresses these questions from an information-flow perspective by comparing existing BIQA models with three paradigms designed to learn the image-text-score relationship: Chain-of-Thought, Self-Consistency, and Autoencoder. Our experiments show that the score prediction performance of the existing model significantly drops when only textual information is used for prediction. Whereas the Chain-of-Thought paradigm introduces little improvement in BIQA performance, the Self-Consistency paradigm significantly reduces the gap between image- and text-conditioned predictions, narrowing the PLCC/SRCC difference to 0.02/0.03. The Autoencoder-like paradigm is less effective in closing the image-text gap, yet it reveals a direction for further optimization. These findings provide insights into how to improve the textual reasoning for BIQA and high-level vision tasks.

Problem

Research questions and friction points this paper is trying to address.

Blind Image Quality Assessment

Textual Reasoning

Image-Text Relationship

Quality Prediction

Information Flow

Innovation

Methods, ideas, or system contributions that make the work stand out.

Blind Image Quality Assessment

Textual Reasoning

Self-Consistency