DPC-VQA: Decoupling Quality Perception and Residual Calibration for Video Quality Assessment

📅 2026-04-14

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

This work addresses the high cost of adapting existing multimodal large language model (MLLM)-based video quality assessment methods to new scenarios, which typically require extensive retraining and abundant subjective annotations. To overcome this limitation, the authors propose a novel decoupled paradigm that freezes the pretrained MLLM to provide a fixed perceptual quality prior and introduces only a lightweight calibration branch to learn residual corrections. This approach is the first to decompose video quality assessment into two stages—fixed perception and learnable calibration—dramatically reducing trainable parameters to less than 2% and annotation requirements to merely 20% of subjective labels. Despite this efficiency, the method achieves performance on par with state-of-the-art approaches on both user-generated content (UGC) and AI-generated content (AIGC) benchmarks, significantly enhancing deployment efficiency and generalization capability.

Technology Category

Application Category

📝 Abstract

Recent multimodal large language models (MLLMs) have shown promising performance on video quality assessment (VQA) tasks. However, adapting them to new scenarios remains expensive due to large-scale retraining and costly mean opinion score (MOS) annotations. In this paper, we argue that a pretrained MLLM already provides a useful perceptual prior for VQA, and that the main challenge is to efficiently calibrate this prior to the target MOS space. Based on this insight, we propose DPC-VQA, a decoupling perception and calibration framework for video quality assessment. Specifically, DPC-VQA uses a frozen MLLM to provide a base quality estimate and perceptual prior, and employs a lightweight calibration branch to predict a residual correction for target-scenario adaptation. This design avoids costly end-to-end retraining while maintaining reliable performance with lower training and data costs. Extensive experiments on both user-generated content (UGC) and AI-generated content (AIGC) benchmarks show that DPC-VQA achieves competitive performance against representative baselines, while using less than 2% of the trainable parameters of conventional MLLM-based VQA methods and remaining effective with only 20\% of MOS labels. The code will be released upon publication.

Problem

Research questions and friction points this paper is trying to address.

video quality assessment

multimodal large language models

mean opinion score

model adaptation

annotation efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

Decoupling

Perceptual Prior

Residual Calibration