Towards Unified Video Quality Assessment

📅 2025-12-01

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

Existing video quality assessment (VQA) methods typically rely on single-model architectures producing scalar scores, suffering from poor interpretability, limited cross-format generalization, and inability to diagnose specific distortion types. To address these limitations, we propose the Diagnostic Mixture-of-Experts (Diag-MoE) framework: a multi-agent expert architecture trained collaboratively without fine-tuning to achieve universal modeling; it incorporates pairwise ranking loss, multi-task learning heads, and weak supervision to jointly optimize global quality scores and multidimensional artifact vectors on a large-scale, self-constructed database. Extensive evaluation across 17 diverse benchmarks—including HD, UHD, HDR, and HFR datasets—demonstrates that Diag-MoE consistently outperforms 18 state-of-the-art baselines under static parameter settings. To our knowledge, this is the first VQA method achieving high accuracy, intrinsic interpretability, and true universality across heterogeneous video formats and distortions.

Technology Category

Application Category

📝 Abstract

Recent works in video quality assessment (VQA) typically employ monolithic models that typically predict a single quality score for each test video. These approaches cannot provide diagnostic, interpretable feedback, offering little insight into why the video quality is degraded. Most of them are also specialized, format-specific metrics rather than truly ``generic"solutions, as they are designed to learn a compromised representation from disparate perceptual domains. To address these limitations, this paper proposes Unified-VQA, a framework that provides a single, unified quality model applicable to various distortion types within multiple video formats by recasting generic VQA as a Diagnostic Mixture-of-Experts (MoE) problem. Unified-VQA employs multiple ``perceptual experts''dedicated to distinct perceptual domains. A novel multi-proxy expert training strategy is designed to optimize each expert using a ranking-inspired loss, guided by the most suitable proxy metric for its domain. We also integrated a diagnostic multi-task head into this framework to generate a global quality score and an interpretable multi-dimensional artifact vector, which is optimized using a weakly-supervised learning strategy, leveraging the known properties of the large-scale training database generated for this work. With static model parameters (without retraining or fine-tuning), Unified-VQA demonstrates consistent and superior performance compared to over 18 benchmark methods for both generic VQA and diagnostic artifact detection tasks across 17 databases containing diverse streaming artifacts in HD, UHD, HDR and HFR formats. This work represents an important step towards practical, actionable, and interpretable video quality assessment.

Problem

Research questions and friction points this paper is trying to address.

Develops a unified video quality assessment model for multiple formats and distortions

Provides interpretable diagnostic feedback beyond single quality scores

Addresses limitations of specialized, non-generic existing VQA methods

Innovation

Methods, ideas, or system contributions that make the work stand out.

Diagnostic Mixture-of-Experts framework for unified VQA

Multi-proxy expert training with ranking-inspired loss

Weakly-supervised multi-task head for interpretable artifact detection

🔎 Similar Papers

ReLaX-VQA: Residual Fragment and Layer Stack Extraction for Enhancing Video Quality Assessment