🤖 AI Summary
This study addresses the pervasive reliance of contemporary vision-based generative AI on a singular “aesthetic” standard rooted in Western-centric and gendered biases, which reinforces imperial and male gazes while systematically marginalizing non-heteronormative and male subjects. Through an innovative integration of large-scale model auditing and digital ethnography, the research critically examines the selection mechanisms and cultural underpinnings of the LAION Aesthetic Predictor (LAP). Cross-cultural image scoring and dataset analyses reveal that LAP significantly favors female-centric content and realistic landscapes or portraits from Western and Japanese contexts, while devaluing LGBTQ+ and male-oriented imagery—thereby reproducing historical power structures embedded in traditional art canons. Building on these findings, the work proposes an ethical shift from monolithic aesthetic norms toward pluralistic evaluation paradigms, offering both theoretical grounding and practical guidance for designing fairer generative AI systems.
📝 Abstract
Visual generative AI models are trained using a one-size-fits-all measure of aesthetic appeal. However, what is deemed"aesthetic"is inextricably linked to personal taste and cultural values, raising the question of whose taste is represented in visual generative AI models. In this work, we study an aesthetic evaluation model--LAION Aesthetic Predictor (LAP)--that is widely used to curate datasets to train visual generative image models, like Stable Diffusion, and evaluate the quality of AI-generated images. To understand what LAP measures, we audited the model across three datasets. First, we examined the impact of aesthetic filtering on the LAION-Aesthetics Dataset (approximately 1.2B images), which was curated from LAION-5B using LAP. We find that the LAP disproportionally filters in images with captions mentioning women, while filtering out images with captions mentioning men or LGBTQ+ people. Then, we used LAP to score approximately 330k images across two art datasets, finding the model rates realistic images of landscapes, cityscapes, and portraits from western and Japanese artists most highly. In doing so, the algorithmic gaze of this aesthetic evaluation model reinforces the imperial and male gazes found within western art history. In order to understand where these biases may have originated, we performed a digital ethnography of public materials related to the creation of LAP. We find that the development of LAP reflects the biases we found in our audits, such as the aesthetic scores used to train LAP primarily coming from English-speaking photographers and western AI-enthusiasts. In response, we discuss how aesthetic evaluation can perpetuate representational harms and call on AI developers to shift away from prescriptive measures of"aesthetics"toward more pluralistic evaluation.