Enhancing Image Quality Assessment Ability of LMMs via Retrieval-Augmented Generation

📅 2026-01-13

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

This work proposes IQARAG, the first training-free framework that integrates retrieval-augmented generation (RAG) into image quality assessment (IQA) for large multimodal models (LMMs). While existing LMMs exhibit zero-shot capabilities in IQA, they often require costly fine-tuning to achieve state-of-the-art performance. IQARAG addresses this limitation by retrieving semantically similar reference images with diverse quality levels and their corresponding mean opinion scores (MOS), providing visual perceptual anchors to guide the LMM. The method employs a three-stage pipeline—retrieval feature extraction, image retrieval, and prompt integration—to effectively enhance the model’s scoring accuracy. Extensive experiments on multiple benchmarks, including KADID, KonIQ, LIVE Challenge, and SPAQ, demonstrate significant performance improvements, confirming the framework’s effectiveness and generalization capability across diverse IQA scenarios.

Technology Category

Application Category

📝 Abstract

Large Multimodal Models (LMMs) have recently shown remarkable promise in low-level visual perception tasks, particularly in Image Quality Assessment (IQA), demonstrating strong zero-shot capability. However, achieving state-of-the-art performance often requires computationally expensive fine-tuning methods, which aim to align the distribution of quality-related token in output with image quality levels. Inspired by recent training-free works for LMM, we introduce IQARAG, a novel, training-free framework that enhances LMMs'IQA ability. IQARAG leverages Retrieval-Augmented Generation (RAG) to retrieve some semantically similar but quality-variant reference images with corresponding Mean Opinion Scores (MOSs) for input image. These retrieved images and input image are integrated into a specific prompt. Retrieved images provide the LMM with a visual perception anchor for IQA task. IQARAG contains three key phases: Retrieval Feature Extraction, Image Retrieval, and Integration&Quality Score Generation. Extensive experiments across multiple diverse IQA datasets, including KADID, KonIQ, LIVE Challenge, and SPAQ, demonstrate that the proposed IQARAG effectively boosts the IQA performance of LMMs, offering a resource-efficient alternative to fine-tuning for quality assessment.

Problem

Research questions and friction points this paper is trying to address.

Image Quality Assessment

Large Multimodal Models

Retrieval-Augmented Generation

Zero-shot Learning

Computational Efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

Retrieval-Augmented Generation

Image Quality Assessment

Large Multimodal Models