🤖 AI Summary
This work addresses the challenges of no-reference video quality assessment (NR-VQA) for game videos, which stem from the absence of human annotations and the inherent complexity of such content—including rapid motion, stylized graphics, and compression artifacts. To tackle these issues, the authors propose MTL-VQA, a multi-task learning framework that innovatively leverages full-reference (FR) quality metrics as self-supervised signals to pretrain perceptual features without requiring human labels. By adaptively weighting and jointly optimizing multiple FR objectives, the model learns transferable shared representations suitable for NR-VQA. Experimental results demonstrate that the proposed method achieves performance on par with state-of-the-art NR-VQA approaches across various settings—including MOS-supervised, label-efficient, and fully self-supervised scenarios—on game video datasets.
📝 Abstract
No-reference video quality assessment (NR-VQA) for gaming videos is challenging due to limited human-rated datasets and unique content characteristics including fast motion, stylized graphics, and compression artifacts. We present MTL-VQA, a multi-task learning framework that uses full-reference metrics as supervisory signals to learn perceptually meaningful features without human labels for pretraining. By jointly optimizing multiple full-reference (FR) objectives with adaptive task weighting, our approach learns shared representations that transfer effectively to NR-VQA. Experiments on gaming video datasets show MTL-VQA achieves performance competitive with state-of-the-art NR-VQA methods across both MOS-supervised and label-efficient/self-supervised settings.