๐ค AI Summary
Existing video quality assessment methods are primarily designed for standard dynamic range (SDR) content and struggle to effectively model distortions unique to high dynamic range (HDR) user-generated content (UGC), such as near-black crushing, highlight clipping, color banding, and exposure flickering. To address this gap, this work presents Beyond8Bitsโthe first large-scale subjective quality dataset for HDR-UGCโand introduces HDR-Q, the first multimodal large language model tailored for HDR-UGC quality assessment. HDR-Q integrates an HDR-perception-aware visual encoder with a reinforcement learning-based HDR-Aware Policy Optimization (HAPO) framework, enhanced by contrastive KL regularization, Gaussian-weighted regression rewards, and crowd rating modeling to accurately capture HDR-specific distortions. Extensive evaluations on Beyond8Bits and public HDR-VQA benchmarks demonstrate that HDR-Q significantly outperforms existing approaches, achieving state-of-the-art performance.
๐ Abstract
High Dynamic Range (HDR) user-generated (UGC) videos are rapidly proliferating across social platforms, yet most perceptual video quality assessment (VQA) systems remain tailored to Standard Dynamic Range (SDR). HDR has a higher bit depth, wide color gamut, and elevated luminance range, exposing distortions such as near-black crushing, highlight clipping, banding, and exposure flicker that amplify UGC artifacts and challenge SDR models. To catalyze progress, we curate Beyond8Bits, a large-scale subjective dataset of 44K videos from 6.5K sources with over 1.5M crowd ratings, spanning diverse scenes, capture conditions, and compression settings. We further introduce HDR-Q, the first Multimodal Large Language Model (MLLM) for HDR-UGC VQA. We propose (i) a novel HDR-aware vision encoder to produce HDR-sensitive embeddings, and (ii) HDR-Aware Policy Optimization (HAPO), an RL finetuning framework that anchors reasoning to HDR cues. HAPO augments GRPO via an HDR-SDR contrastive KL that encourages token reliance on HDR inputs and a Gaussian weighted regression reward for fine-grained MOS calibration. Across Beyond8Bits and public HDR-VQA benchmarks, HDR-Q delivers state-of-the-art performance.