ReLaX-VQA: Residual Fragment and Layer Stack Extraction for Enhancing Video Quality Assessment

๐Ÿ“… 2024-07-16
๐Ÿ›๏ธ arXiv.org
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
To address the challenge of no-reference, multi-generation compressed video quality assessment in user-generated content (UGC), this paper proposes a novel no-reference video quality assessment (NR-VQA) method. The approach innovatively jointly models residual frame patches and optical flow, integrating multi-level feature stacking from both CNN (ResNet) and Vision Transformer (ViT) architectures to jointly enhance spatiotemporal perception and semantic abstraction. Key components include residual frame extraction, optical flow estimation, multi-scale spatial representation learning, and cross-architecture hierarchical feature fusion, all optimized end-to-end. Evaluated on four mainstream UGC datasets, the method achieves average Spearman rank-order correlation coefficient (SRCC) of 0.8658 and Pearson linear correlation coefficient (PLCC) of 0.8872โ€”substantially outperforming state-of-the-art methods. The source code and pre-trained models are publicly available.

Technology Category

Application Category

๐Ÿ“ Abstract
With the rapid growth of User-Generated Content (UGC) exchanged between users and sharing platforms, the need for video quality assessment in the wild has emerged. UGC is mostly acquired using consumer devices and undergoes multiple rounds of compression or transcoding before reaching the end user. Therefore, traditional quality metrics that require the original content as a reference cannot be used. In this paper, we propose ReLaX-VQA, a novel No-Reference Video Quality Assessment (NR-VQA) model that aims to address the challenges of evaluating the diversity of video content and the assessment of its quality without reference videos. ReLaX-VQA uses fragments of residual frames and optical flow, along with different expressions of spatial features of the sampled frames, to enhance motion and spatial perception. Furthermore, the model enhances abstraction by employing layer-stacking techniques in deep neural network features (from Residual Networks and Vision Transformers). Extensive testing on four UGC datasets confirms that ReLaX-VQA outperforms existing NR-VQA methods with an average SRCC value of 0.8658 and PLCC value of 0.8872. We will open source the code and trained models to facilitate further research and applications of NR-VQA: https://github.com/xinyiW915/ReLaX-VQA.
Problem

Research questions and friction points this paper is trying to address.

Develops a no-reference video quality assessment model for UGC.
Addresses challenges in evaluating compressed video content quality.
Enhances quality assessment using spatio-temporal fragments and deep learning.
Innovation

Methods, ideas, or system contributions that make the work stand out.

No-Reference Video Quality Assessment model
Frame differences for spatio-temporal fragment selection
Layer-stacking in deep neural network features
๐Ÿ”Ž Similar Papers
No similar papers found.
X
Xinyi Wang
Visual Information Lab, School of Computer Science, University of Bristol, Bristol BS1 8UB, UK
A
Angeliki V. Katsenou
Visual Information Lab, School of Computer Science, University of Bristol, Bristol BS1 8UB, UK
D
David R. Bull
Visual Information Lab, School of Computer Science, University of Bristol, Bristol BS1 8UB, UK