🤖 AI Summary
Subjective video quality assessment (VQA) methods—Absolute Category Rating (ACR), ACR with Hidden Reference (ACR-HR), and Comparison Category Rating (CCR)—exhibit varying statistical power and cost-efficiency under real-world distortions (e.g., blur, compression, scaling, stalling), multi-resolution content, and encoding ladder tasks. Method: Following ITU-T P.910, a side-by-side crowdsourced experiment was conducted to collect large-scale subjective ratings, followed by rigorous statistical analysis. Contribution/Results: ACR-HR achieves the highest operational efficiency (lowest time/cost, smallest variance) and strong comparability with ACR (high correlation with ACR condition means), albeit suffering from scale compression. CCR demonstrates superior sensitivity to quality improvements and can detect supra-reference quality differences. Critically, method choice induces significant shifts in recommended bitrate thresholds—quantifying, for the first time, how VQA methodology directly impacts QoE optimization decisions. The study provides actionable, empirically grounded guidelines for method selection in industrial VQA deployments.
📝 Abstract
In crowdsourced subjective video quality assessment, practitioners often face a choice between Absolute Category Rating (ACR), ACR with Hidden Reference (ACR-HR), and Comparison Category Rating (CCR). We conducted a P.910-compliant, side-by-side comparison across six studies using 15 talking-head sources of good and fair quality, processed with realistic degradations (blur, scaling, compression, freezing, and their combinations), as well as a practical bitrate-ladder task at 720p and 1080p resolutions. We evaluated statistical efficiency (standard deviations), economic efficiency, and decision agreement. Our results show that ACR-HR and ACR correlate strongly at the condition level, while CCR is more sensitive-capturing improvements beyond the reference. ACR-HR, however, exhibits compressed scale use, particularly for videos with fair source quality. ACR-HR is approximately twice as fast and cost-effective, with lower normalized variability, yet the choice of quality measurement method shifts saturation points and bitrate-ladder recommendations. Finally, we provide practical guidance on when to use each test method.