🤖 AI Summary
Existing methods for modeling human perception of image visual complexity lack interpretability and cognitive alignment, failing to jointly account for structural and semantic dimensions. Method: We systematically decouple visual complexity into two orthogonal components: structural complexity (captured via multi-scale Sobel gradients and distinctive color distributions) and semantic complexity (measured by scene-level surprise derived from Visual Genome). We introduce SVG, the first dataset explicitly designed for semantic complexity modeling, and propose a lightweight, interpretable, data-driven quantification framework—avoiding opaque deep learning models. Contribution/Results: Experiments demonstrate that integrating both dimensions significantly improves prediction accuracy and cross-dataset generalization. Structural features predominantly govern low-level perceptual processing, while semantic surprise enhances high-level conceptual understanding. All code, data, and configurations are publicly released.
📝 Abstract
Understanding how humans perceive visual complexity is a key area of study in visual cognition. Previous approaches to modeling visual complexity have often resulted in intricate, difficult-to-interpret solutions that employ numerous features or sophisticated deep learning architectures. While these complex models achieve high performance on specific datasets, they often sacrifice interpretability, making it challenging to understand the factors driving human perception of complexity. A recent model based on image segmentations showed promise in addressing this challenge; however, it presented limitations in capturing structural and semantic aspects of visual complexity. In this paper, we propose viable and effective features to overcome these shortcomings. Specifically, we develop multiscale features for the structural aspect of complexity, including the Multiscale Sobel Gradient (MSG), which captures spatial intensity variations across scales, and Multiscale Unique Colors (MUC), which quantifies image colorfulness by indexing quantized RGB values. We also introduce a new dataset SVG based on Visual Genome to explore the semantic aspect of visual complexity, obtaining surprise scores based on the element of surprise in images, which we demonstrate significantly contributes to perceived complexity. Overall, we suggest that the nature of the data is fundamental to understanding and modeling visual complexity, highlighting the importance of both structural and semantic dimensions in providing a comprehensive, interpretable assessment. The code for our analysis, experimental setup, and dataset will be made publicly available upon acceptance.