CompressedVQA-HDR: Generalized Full-reference and No-reference Quality Assessment Models for Compressed High Dynamic Range Videos

๐Ÿ“… 2025-07-16
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Existing full-reference (FR) and no-reference (NR) video quality assessment (VQA) methods exhibit limited generalization to compressed high-dynamic-range (HDR) video. To address this, we propose the first unified FR and NR HDR VQA framework. Our method innovatively employs the Swin Transformer for FR assessment to jointly model inter-frame structural and textural dependencies, and introduces SigLip-2โ€”previously unexplored in VQAโ€”as the NR feature extractor. We further design a cross-dataset pretraining scheme coupled with iterative mixed training to enhance robustness across diverse HDR content. Extensive evaluations on multiple HDR VQA benchmarks demonstrate state-of-the-art performance. Notably, our CompressedVQA-HDR-FR variant achieves first place in the FR track of the IEEE ICME 2025 HDR&SDR Video Quality Assessment Challenge.

Technology Category

Application Category

๐Ÿ“ Abstract
Video compression is a standard procedure applied to all videos to minimize storage and transmission demands while preserving visual quality as much as possible. Therefore, evaluating the visual quality of compressed videos is crucial for guiding the practical usage and further development of video compression algorithms. Although numerous compressed video quality assessment (VQA) methods have been proposed, they often lack the generalization capability needed to handle the increasing diversity of video types, particularly high dynamic range (HDR) content. In this paper, we introduce CompressedVQA-HDR, an effective VQA framework designed to address the challenges of HDR video quality assessment. Specifically, we adopt the Swin Transformer and SigLip 2 as the backbone networks for the proposed full-reference (FR) and no-reference (NR) VQA models, respectively. For the FR model, we compute deep structural and textural similarities between reference and distorted frames using intermediate-layer features extracted from the Swin Transformer as its quality-aware feature representation. For the NR model, we extract the global mean of the final-layer feature maps from SigLip 2 as its quality-aware representation. To mitigate the issue of limited HDR training data, we pre-train the FR model on a large-scale standard dynamic range (SDR) VQA dataset and fine-tune it on the HDRSDR-VQA dataset. For the NR model, we employ an iterative mixed-dataset training strategy across multiple compressed VQA datasets, followed by fine-tuning on the HDRSDR-VQA dataset. Experimental results show that our models achieve state-of-the-art performance compared to existing FR and NR VQA models. Moreover, CompressedVQA-HDR-FR won first place in the FR track of the Generalizable HDR & SDR Video Quality Measurement Grand Challenge at IEEE ICME 2025. The code is available at https://github.com/sunwei925/CompressedVQA-HDR.
Problem

Research questions and friction points this paper is trying to address.

Assessing quality of compressed HDR videos accurately
Overcoming generalization limitations in existing VQA methods
Addressing limited HDR training data for VQA models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Swin Transformer for FR VQA model
Employs SigLip 2 for NR VQA model
Pre-trains on SDR dataset for HDR adaptation
๐Ÿ”Ž Similar Papers
No similar papers found.