DIVA-VQA: Detecting Inter-frame Variations in UGC Video Quality

📅 2025-08-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of reference-free perceptual quality assessment (NR-VQA) for user-generated content (UGC) videos—characterized by the absence of reference frames—this paper proposes an inter-frame change-driven spatiotemporal fragmentation analysis method. Innovatively, inter-frame differences guide spatial fragmentation, while a residual alignment mechanism fuses fragmented frames with multi-scale spatiotemporal features, enabling a hierarchical perception network that jointly localizes sensitive regions and models dynamic quality degradation. The framework employs collaborative 2D/3D convolutions to extract frame-level, fragment-level, and residual-level features. Evaluated on five UGC video datasets, it achieves average SROCC scores of 0.898 (DIVA-VQA-L) and 0.886 (DIVA-VQA-B), outperforming state-of-the-art methods with efficient inference. The source code is publicly available.

Technology Category

Application Category

📝 Abstract
The rapid growth of user-generated (video) content (UGC) has driven increased demand for research on no-reference (NR) perceptual video quality assessment (VQA). NR-VQA is a key component for large-scale video quality monitoring in social media and streaming applications where a pristine reference is not available. This paper proposes a novel NR-VQA model based on spatio-temporal fragmentation driven by inter-frame variations. By leveraging these inter-frame differences, the model progressively analyses quality-sensitive regions at multiple levels: frames, patches, and fragmented frames. It integrates frames, fragmented residuals, and fragmented frames aligned with residuals to effectively capture global and local information. The model extracts both 2D and 3D features in order to characterize these spatio-temporal variations. Experiments conducted on five UGC datasets and against state-of-the-art models ranked our proposed method among the top 2 in terms of average rank correlation (DIVA-VQA-L: 0.898 and DIVA-VQA-B: 0.886). The improved performance is offered at a low runtime complexity, with DIVA-VQA-B ranked top and DIVA-VQA-L third on average compared to the fastest existing NR-VQA method. Code and models are publicly available at: https://github.com/xinyiW915/DIVA-VQA.
Problem

Research questions and friction points this paper is trying to address.

Detecting inter-frame variations in user-generated video quality
Assessing no-reference perceptual quality without pristine reference
Integrating spatio-temporal fragmentation for global and local analysis
Innovation

Methods, ideas, or system contributions that make the work stand out.

Spatio-temporal fragmentation driven by inter-frame variations
Multi-level analysis of frames, patches and fragmented frames
Integration of 2D and 3D features for quality assessment
🔎 Similar Papers
No similar papers found.