🤖 AI Summary
This work addresses the challenges in blind ultra-high-definition (UHD) image quality assessment, where full-resolution inference is computationally prohibitive and naive downsampling or isolated patch cropping fails to capture scale-sensitive distortions and global-local dependencies. To overcome these limitations, the authors propose the first graph neural network–based approach that encodes image patches as nodes and constructs a hybrid k-nearest neighbor graph based on spatial proximity and feature similarity. Contextual information is propagated via residual graph convolutions, and regional evidence is aggregated through gated attention pooling to predict overall image quality. Departing from conventional independent-processing assumptions, the method explicitly models inter-regional structural dependencies and introduces a multi-objective loss function with exponential moving average normalization to jointly optimize regression accuracy, correlation, and ranking performance. On the UHD-IQA benchmark, it achieves state-of-the-art results with PLCC = 0.7784, SRCC = 0.8019, and RMSE = 0.0519—the lowest RMSE reported to date—demonstrating significantly improved absolute quality prediction accuracy.
📝 Abstract
Blind image quality assessment (BIQA) for ultrahighdefinition (UHD) images remains challenging because native-resolution inference is computationally expensive, whereas aggressive resizing or isolated cropping may suppress scale-sensitive distortions and weaken the relationship between local artifacts and global scene context. This paper aims to improve UHD-BIQA by explicitly modeling the structural dependencies among sampled image regions rather than treating them as independent views, and a graph representation learning framework UHD-GCN-BIQA is proposed. The framework samples aspect-ratio-aligned patches from each UHD image, encodes them as graph nodes, and constructs a hybrid k-nearest-neighbor graph using spatial proximity and feature similarity. Residual graph convolution is used to propagate contextual information across regions, and gated attention pooling aggregates patchlevel evidence into an imagelevel quality prediction. An exponential moving average normalized multiobjective loss function is adopted to stabilize the joint optimization of regression, correlation, and ranking objectives. Experiments on the UHD-IQA benchmark show that UHD-GCN-BIQA achieves PLCC = 0.7784, SRCC = 0.8019, and RMSE = 0.0519, obtaining competitive correlation performance and the lowest RMSE among the compared methods. These results indicate that graph-based region relation modeling is effective for UHD image quality assessment, particularly for improving absolute quality score estimation under high-resolution visual content.