🤖 AI Summary
This work proposes IQARAG, the first training-free framework that integrates retrieval-augmented generation (RAG) into image quality assessment (IQA) for large multimodal models (LMMs). While existing LMMs exhibit zero-shot capabilities in IQA, they often require costly fine-tuning to achieve state-of-the-art performance. IQARAG addresses this limitation by retrieving semantically similar reference images with diverse quality levels and their corresponding mean opinion scores (MOS), providing visual perceptual anchors to guide the LMM. The method employs a three-stage pipeline—retrieval feature extraction, image retrieval, and prompt integration—to effectively enhance the model’s scoring accuracy. Extensive experiments on multiple benchmarks, including KADID, KonIQ, LIVE Challenge, and SPAQ, demonstrate significant performance improvements, confirming the framework’s effectiveness and generalization capability across diverse IQA scenarios.
📝 Abstract
Large Multimodal Models (LMMs) have recently shown remarkable promise in low-level visual perception tasks, particularly in Image Quality Assessment (IQA), demonstrating strong zero-shot capability. However, achieving state-of-the-art performance often requires computationally expensive fine-tuning methods, which aim to align the distribution of quality-related token in output with image quality levels. Inspired by recent training-free works for LMM, we introduce IQARAG, a novel, training-free framework that enhances LMMs'IQA ability. IQARAG leverages Retrieval-Augmented Generation (RAG) to retrieve some semantically similar but quality-variant reference images with corresponding Mean Opinion Scores (MOSs) for input image. These retrieved images and input image are integrated into a specific prompt. Retrieved images provide the LMM with a visual perception anchor for IQA task. IQARAG contains three key phases: Retrieval Feature Extraction, Image Retrieval, and Integration&Quality Score Generation. Extensive experiments across multiple diverse IQA datasets, including KADID, KonIQ, LIVE Challenge, and SPAQ, demonstrate that the proposed IQARAG effectively boosts the IQA performance of LMMs, offering a resource-efficient alternative to fine-tuning for quality assessment.