🤖 AI Summary
This paper addresses the insufficient bitrate–quality ranking consistency of existing video quality assessment (VQA) metrics by proposing a general-purpose no-reference evaluation framework targeting compression artifacts. We construct a large-scale benchmark dataset comprising 6,240 video clips—derived from 59 source videos encoded under 186 codec presets—and uniformly integrate 1.8 million pairwise comparisons and 1,500 Mean Opinion Scores (MOS). To quantitatively measure a model’s ability to preserve the monotonic relationship between bitrate and perceived quality, we introduce the Ranking-Directed Artifact Evaluation (RDAE) metric. Experiments reveal that mainstream full-reference and no-reference IQA/VQA methods achieve high RDAE scores yet exhibit low correlation with human judgments, confirming the dataset’s challenge and practical utility for VQA research. The dataset and evaluation results are partially open-sourced to support codec parameter optimization and VQA model validation.
📝 Abstract
We propose the LEHA-CVQAD (Large-scale Enriched Human-Annotated) dataset, which comprises 6,240 clips for compression-oriented video quality assessment. 59 source videos are encoded with 186 codec-preset variants, 1.8M pairwise, and 1.5k MOS ratings are fused into a single quality scale; part of the videos remains hidden for blind evaluation. We also propose Rate-Distortion Alignment Error (RDAE), a novel evaluation metric that quantifies how well VQA models preserve bitrate-quality ordering, directly supporting codec parameter tuning. Testing IQA/VQA methods reveals that popular VQA metrics exhibit high RDAE and lower correlations, underscoring the dataset challenges and utility. The open part and the results of LEHA-CVQAD are available at https://aleksandrgushchin.github$.io/lcvqad/