Enhancing Image Quality Assessment Ability of LMMs via Retrieval-Augmented Generation

📅 2026-01-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes IQARAG, the first training-free framework that integrates retrieval-augmented generation (RAG) into image quality assessment (IQA) for large multimodal models (LMMs). While existing LMMs exhibit zero-shot capabilities in IQA, they often require costly fine-tuning to achieve state-of-the-art performance. IQARAG addresses this limitation by retrieving semantically similar reference images with diverse quality levels and their corresponding mean opinion scores (MOS), providing visual perceptual anchors to guide the LMM. The method employs a three-stage pipeline—retrieval feature extraction, image retrieval, and prompt integration—to effectively enhance the model’s scoring accuracy. Extensive experiments on multiple benchmarks, including KADID, KonIQ, LIVE Challenge, and SPAQ, demonstrate significant performance improvements, confirming the framework’s effectiveness and generalization capability across diverse IQA scenarios.

Technology Category

Application Category

📝 Abstract
Large Multimodal Models (LMMs) have recently shown remarkable promise in low-level visual perception tasks, particularly in Image Quality Assessment (IQA), demonstrating strong zero-shot capability. However, achieving state-of-the-art performance often requires computationally expensive fine-tuning methods, which aim to align the distribution of quality-related token in output with image quality levels. Inspired by recent training-free works for LMM, we introduce IQARAG, a novel, training-free framework that enhances LMMs'IQA ability. IQARAG leverages Retrieval-Augmented Generation (RAG) to retrieve some semantically similar but quality-variant reference images with corresponding Mean Opinion Scores (MOSs) for input image. These retrieved images and input image are integrated into a specific prompt. Retrieved images provide the LMM with a visual perception anchor for IQA task. IQARAG contains three key phases: Retrieval Feature Extraction, Image Retrieval, and Integration&Quality Score Generation. Extensive experiments across multiple diverse IQA datasets, including KADID, KonIQ, LIVE Challenge, and SPAQ, demonstrate that the proposed IQARAG effectively boosts the IQA performance of LMMs, offering a resource-efficient alternative to fine-tuning for quality assessment.
Problem

Research questions and friction points this paper is trying to address.

Image Quality Assessment
Large Multimodal Models
Retrieval-Augmented Generation
Zero-shot Learning
Computational Efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Retrieval-Augmented Generation
Image Quality Assessment
Large Multimodal Models
Zero-shot Learning
Training-free Framework
🔎 Similar Papers
No similar papers found.
K
Kang Fu
Shanghai Jiao Tong University
Huiyu Duan
Huiyu Duan
Shanghai Jiao Tong University
Multimedia Signal Processing
Z
Zicheng Zhang
Shanghai Jiao Tong University
Yucheng Zhu
Yucheng Zhu
Shanghai Jiaotong University
Multimedia Signal Processing
J
Jun Zhao
Tencent
X
Xiongkuo Min
Shanghai Jiao Tong University
J
Jia Wang
Shanghai Jiao Tong University
Guangtao Zhai
Guangtao Zhai
Professor, IEEE Fellow, Shanghai Jiao Tong University
Multimedia Signal ProcessingVisual Quality AssessmentQoEAI EvaluationDisplays