🤖 AI Summary
To address the need for low-overhead, interpretable no-reference and weak-reference video quality assessment (VQA), this paper proposes a lightweight, deep learning–free method. The approach fuses video complexity analysis (VCA) features with structural similarity (SSIM)-based residuals, constructs frame-level residual representations, applies temporal pooling, and feeds the pooled features into an XGBoost regression model. This work is the first to jointly leverage video complexity analysis and structural residual modeling for weak-reference VQA—without neural networks or GPU acceleration—ensuring high interpretability and real-time performance. Evaluated on the VQA Grand Challenge dataset, the method achieves PLCC and SRCC scores exceeding 0.92, with inference latency under 10 ms per frame, enabling real-time 4K video stream quality monitoring.
📝 Abstract
This paper presents a novel approach for reduced-reference video quality assessment (VQA), developed as part of the recent VQA Grand Challenge. Our method leverages low-level complexity and structural information from reference and test videos to predict perceptual quality scores. Specifically, we extract spatio-temporal features using Video Complexity Analyzer (VCA) and compute SSIM values from the test video to capture both texture and structural characteristics. These features are aggregated through temporal pooling, and residual features are calculated by comparing the original and distorted feature sets. The combined features are used to train an XGBoost regression model that estimates the overall video quality. The pipeline is fully automated, interpretable, and highly scalable, requiring no deep neural networks or GPU inference. Experimental results on the challenge dataset demonstrate that our proposed method achieves competitive correlation with subjective quality scores while maintaining a low computational footprint. The model's lightweight design and strong generalization performance suit real-time streaming quality monitoring and adaptive encoding scenarios.