Q-Adapt: Adapting LMM for Visual Quality Assessment with Progressive Instruction Tuning

📅 2025-04-02

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Current large multimodal models (LMMs) struggle to simultaneously deliver holistic quality explanations and attribute-level perceptual reasoning in explainable image quality assessment (EIQA), suffering from task interference that undermines perceptual understanding. To address this, we propose a perception-oriented progressive instruction-tuning paradigm: (1) decoupling and transferring generic perceptual knowledge in the first stage; and (2) enabling joint attribute- and holistic-quality reasoning via instruction-adaptive visual prompting in the second stage. Integrated with LoRA-based efficient fine-tuning, our approach yields a lightweight EIQA model. Extensive experiments on multiple perception-centric benchmarks and mainstream IQA datasets demonstrate performance on par with or surpassing state-of-the-art methods. Notably, our model is the first to achieve integrated high-fidelity, interpretable, and attribute-aware image quality assessment.

Technology Category

Application Category

📝 Abstract

The rapid advancement of Large Multi-modal Foundation Models (LMM) has paved the way for the possible Explainable Image Quality Assessment (EIQA) with instruction tuning from two perspectives: overall quality explanation, and attribute-wise perception answering. However, existing works usually overlooked the conflicts between these two types of perception explanations during joint instruction tuning, leading to insufficient perception understanding. To mitigate this, we propose a new paradigm for perception-oriented instruction tuning, i.e., Q-Adapt, which aims to eliminate the conflicts and achieve the synergy between these two EIQA tasks when adapting LMM, resulting in enhanced multi-faceted explanations of IQA. Particularly, we propose a progressive instruction tuning strategy by dividing the adaption process of LMM for EIQA into two stages, where the first stage empowers the LMM with universal perception knowledge tailored for two tasks using an efficient transfer learning strategy, i.e., LoRA, and the second stage introduces the instruction-adaptive visual prompt tuning to dynamically adapt visual features for the different instructions from two tasks. In this way, our proposed Q-Adapt can achieve a lightweight visual quality evaluator, demonstrating comparable performance and, in some instances, superior results across perceptual-related benchmarks and commonly-used IQA databases. The source code is publicly available at https://github.com/yeppp27/Q-Adapt.

Problem

Research questions and friction points this paper is trying to address.

Resolves conflicts between overall and attribute-wise quality explanations in EIQA

Enhances multi-faceted image quality assessment via progressive instruction tuning

Achieves lightweight LMM adaptation for superior perceptual benchmark performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Progressive instruction tuning for EIQA

LoRA for efficient perception knowledge transfer

Instruction-adaptive visual prompt tuning

🔎 Similar Papers

No similar papers found.

Authors to Follow