Decoupling Perception and Calibration: Label-Efficient Image Quality Assessment Framework

📅 2026-01-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the high cost and limited deployability of multimodal large language models (MLLMs) in image quality assessment, which typically rely on extensive human-annotated mean opinion scores (MOS). The authors propose LEAF, a novel framework that decouples perceptual modeling from MOS scale calibration for the first time. LEAF leverages a teacher MLLM to generate dense supervision signals—specifically, pointwise quality judgments and pairwise preferences—which are then used to train a lightweight student regressor via knowledge distillation. Only a minimal number of MOS annotations are required during the final calibration stage. Experiments on both user-generated and AI-generated image benchmarks demonstrate that LEAF drastically reduces annotation demands while maintaining strong correlation with human ratings.

Technology Category

Application Category

📝 Abstract
Recent multimodal large language models (MLLMs) have demonstrated strong capabilities in image quality assessment (IQA) tasks. However, adapting such large-scale models is computationally expensive and still relies on substantial Mean Opinion Score (MOS) annotations. We argue that for MLLM-based IQA, the core bottleneck lies not in the quality perception capacity of MLLMs, but in MOS scale calibration. Therefore, we propose LEAF, a Label-Efficient Image Quality Assessment Framework that distills perceptual quality priors from an MLLM teacher into a lightweight student regressor, enabling MOS calibration with minimal human supervision. Specifically, the teacher conducts dense supervision through point-wise judgments and pair-wise preferences, with an estimate of decision reliability. Guided by these signals, the student learns the teacher's quality perception patterns through joint distillation and is calibrated on a small MOS subset to align with human annotations. Experiments on both user-generated and AI-generated IQA benchmarks demonstrate that our method significantly reduces the need for human annotations while maintaining strong MOS-aligned correlations, making lightweight IQA practical under limited annotation budgets.
Problem

Research questions and friction points this paper is trying to address.

Image Quality Assessment
Mean Opinion Score
Label Efficiency
Multimodal Large Language Models
Calibration
Innovation

Methods, ideas, or system contributions that make the work stand out.

Label-Efficient Learning
Knowledge Distillation
Image Quality Assessment
MOS Calibration
Multimodal Large Language Models
🔎 Similar Papers
No similar papers found.