Exposing Blindspots: Cultural Bias Evaluation in Generative Image Models

๐Ÿ“… 2025-10-22
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Prior work on cultural bias in generative image models focuses predominantly on text-to-image (T2I) synthesis, overlooking cultural fidelity degradation in image-to-image (I2I) editing. Method: We introduce a novel, era-aware cultural evaluation framework spanning six countries, eight categories, and 36 subcategories across multiple historical periods. It unifies T2I and I2I assessment via temporally grounded prompts and integrates automated metrics, culturally aware retrieval-augmented visual question answering (VQA), and expert-led human evaluation by local domain specialists. Contribution/Results: This is the first cross-national, cross-epoch, and cross-category cultural bias benchmark for generative image models. Experiments reveal strong model preference for Global North modern aesthetics under nationality-agnostic prompts; I2I editing consistently erodes cultural authenticity and perpetuates stereotypical representations of Global South nations. We publicly release our dataset, prompt templates, and evaluation protocol to advance culturally sensitive image generation research.

Technology Category

Application Category

๐Ÿ“ Abstract
Generative image models produce striking visuals yet often misrepresent culture. Prior work has examined cultural bias mainly in text-to-image (T2I) systems, leaving image-to-image (I2I) editors underexplored. We bridge this gap with a unified evaluation across six countries, an 8-category/36-subcategory schema, and era-aware prompts, auditing both T2I generation and I2I editing under a standardized protocol that yields comparable diagnostics. Using open models with fixed settings, we derive cross-country, cross-era, and cross-category evaluations. Our framework combines standard automatic metrics, a culture-aware retrieval-augmented VQA, and expert human judgments collected from native reviewers. To enable reproducibility, we release the complete image corpus, prompts, and configurations. Our study reveals three findings: (1) under country-agnostic prompts, models default to Global-North, modern-leaning depictions that flatten cross-country distinctions; (2) iterative I2I editing erodes cultural fidelity even when conventional metrics remain flat or improve; and (3) I2I models apply superficial cues (palette shifts, generic props) rather than era-consistent, context-aware changes, often retaining source identity for Global-South targets. These results highlight that culture-sensitive edits remain unreliable in current systems. By releasing standardized data, prompts, and human evaluation protocols, we provide a reproducible, culture-centered benchmark for diagnosing and tracking cultural bias in generative image models.
Problem

Research questions and friction points this paper is trying to address.

Evaluating cultural bias in generative image models across six countries
Assessing cultural fidelity erosion during iterative image-to-image editing
Analyzing superficial cultural cues versus context-aware model changes
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified evaluation across six countries and eras
Combining automatic metrics with culture-aware VQA
Standardized reproducible benchmark for cultural bias
๐Ÿ”Ž Similar Papers
No similar papers found.
H
Huichan Seo
Carnegie Mellon University, Pittsburgh, United States
S
Sieun Choi
Dongguk University, Seoul, South Korea
Minki Hong
Minki Hong
Korea Advanced Institute of Science and Technology
Human-Computer InteractionComputational Interaction
Y
Yi Zhou
Carnegie Mellon University, Pittsburgh, United States
J
Junseo Kim
Delft University of Technology, Delft, Netherlands
L
Lukman Ismaila
Johns Hopkins University, School of Medicine, Baltimore, United States
N
Naome Etori
University of Minnesotaโ€“Twin Cities, Minneapolis, United States
M
Mehul Agarwal
Carnegie Mellon University, Pittsburgh, United States
Zhixuan Liu
Zhixuan Liu
PhD student at Shanghai Jiaotong University
deep learningreinforcement learning
Jihie Kim
Jihie Kim
Dongguk University
Artificial IntelligenceComputer EducationHuman Computer InteractionNLP
Jean Oh
Jean Oh
Robotics Institute, Carnegie Mellon University
RoboticsMultimodal PerceptionSocial NavigationLanguage-Vision intersectionArtificial Intelligence