🤖 AI Summary
This study investigates whether visual generative AI models can learn and reproduce the semantic and formal features of iconic images—widely recognized cultural symbols—as humans do. Method: We systematically evaluate the reconstruction capability of mainstream diffusion models on 128 iconic images using data attribution analysis, cross-modal semantic similarity metrics, and a three-stage user study. Contribution/Results: Despite abundant related training data, model outputs exhibit significantly lower structural fidelity and public recognizability than human expectations. Data attribution reveals minimal influence of iconic images on generation, indicating failure to establish stable semantic anchors. This work provides the first empirical evidence of a fundamental limitation in generative models’ capacity to encode “cultural memory.” It introduces a novel methodology for assessing AI’s visual cultural understanding and establishes a benchmark dataset for evaluating cultural semantics in generative vision models.
📝 Abstract
How humans interpret and produce images is influenced by the images we have been exposed to. Similarly, visual generative AI models are exposed to many training images and learn to generate new images based on this. Given the importance of iconic images in human visual communication, as they are widely seen, reproduced, and used as inspiration, we may expect that they may similarly have a proportionally large influence within the generative AI process. In this work we explore this question through a three-part analysis, involving data attribution, semantic similarity analysis, and a user-study. Our findings indicate that iconic images do not have an obvious influence on the generative process, and that for many icons it is challenging to reproduce an image which resembles it closely. This highlights an important difference in how humans and visual generative AI models draw on and learn from prior visual communication.