🤖 AI Summary
This study addresses the limited generalization of blind image quality assessment (BIQA) models trained on synthetic data to real-world scenarios, a challenge primarily attributed to feature clustering within synthetic data distributions. For the first time, this work identifies the distributional gap between synthetic and real IQA data as a key bottleneck to cross-domain generalization. To bridge this gap, the authors propose a distribution reshaping framework grounded in diversity and redundancy theory, which jointly optimizes the structural distribution of synthetic data through distribution-aware content-diversified upsampling and density-aware redundant-cluster downsampling. Extensive experiments across three cross-domain settings—synthetic-to-real, synthetic-to-algorithmically-generated, and synthetic-to-synthetic—demonstrate consistent and significant performance gains, validating both the effectiveness and broad applicability of the proposed approach.
📝 Abstract
Blind Image Quality Assessment (BIQA) has advanced significantly through deep learning, but the scarcity of large-scale labeled datasets remains a challenge. While synthetic data offers a promising solution, models trained on existing synthetic datasets often show limited generalization ability. In this work, we make a key observation that representations learned from synthetic datasets often exhibit a discrete and clustered pattern that hinders regression performance: features of high-quality images cluster around reference images, while those of low-quality images cluster based on distortion types. Our analysis reveals that this issue stems from the distribution of synthetic data rather than model architecture. Consequently, we introduce a novel framework SynDR-IQA, which reshapes synthetic data distribution to enhance BIQA generalization. Based on theoretical derivations of sample diversity and redundancy's impact on generalization error, SynDR-IQA employs two strategies: distribution-aware diverse content upsampling, which enhances visual diversity while preserving content distribution, and density-aware redundant cluster downsampling, which balances samples by reducing the density of densely clustered areas. Extensive experiments across three cross-dataset settings (synthetic-to-authentic, synthetic-to-algorithmic, and synthetic-to-synthetic) demonstrate the effectiveness of our method. The code is available at https://github.com/Li-aobo/SynDR-IQA.