๐ค AI Summary
This study identifies, for the first time, a novel class of cross-modal bias: cultural stereotype reproduction in text-to-image generation by vision-language models (VLMs)โe.g., โowl = wisdomโ, โfox = cunningโ. We conduct controlled generation experiments using DALL-E, employ fine-grained prompt engineering, and establish a mixed-method validation framework combining qualitative analysis and statistical quantification. Results demonstrate that VLMs consistently amplify pre-existing human cultural symbol associations, revealing a previously uncharacterized mechanism of bias propagation across modalities. Our work extends AI fairness research beyond sociodemographic attributes to encompass culturally embedded symbolic representations. It provides both theoretical grounding and a reproducible methodological paradigm for ethical assessment and debiasing interventions in multimodal generative systems.
๐ Abstract
Animal stereotypes are deeply embedded in human culture and language. They often shape our perceptions and expectations of various species. Our study investigates how animal stereotypes manifest in vision-language models during the task of image generation. Through targeted prompts, we explore whether DALL-E perpetuates stereotypical representations of animals, such as"owls as wise,""foxes as unfaithful,"etc. Our findings reveal significant stereotyped instances where the model consistently generates images aligned with cultural biases. The current work is the first of its kind to examine animal stereotyping in vision-language models systematically and to highlight a critical yet underexplored dimension of bias in AI-generated visual content.