Exposing Blindspots: Cultural Bias Evaluation in Generative Image Models

πŸ“… 2025-10-22
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

189K/year
πŸ€– AI Summary
Prior work on cultural bias in generative image models focuses predominantly on text-to-image (T2I) synthesis, overlooking cultural fidelity degradation in image-to-image (I2I) editing. Method: We introduce a novel, era-aware cultural evaluation framework spanning six countries, eight categories, and 36 subcategories across multiple historical periods. It unifies T2I and I2I assessment via temporally grounded prompts and integrates automated metrics, culturally aware retrieval-augmented visual question answering (VQA), and expert-led human evaluation by local domain specialists. Contribution/Results: This is the first cross-national, cross-epoch, and cross-category cultural bias benchmark for generative image models. Experiments reveal strong model preference for Global North modern aesthetics under nationality-agnostic prompts; I2I editing consistently erodes cultural authenticity and perpetuates stereotypical representations of Global South nations. We publicly release our dataset, prompt templates, and evaluation protocol to advance culturally sensitive image generation research.

Technology Category

Application Category

πŸ“ Abstract
Generative image models produce striking visuals yet often misrepresent culture. Prior work has examined cultural bias mainly in text-to-image (T2I) systems, leaving image-to-image (I2I) editors underexplored. We bridge this gap with a unified evaluation across six countries, an 8-category/36-subcategory schema, and era-aware prompts, auditing both T2I generation and I2I editing under a standardized protocol that yields comparable diagnostics. Using open models with fixed settings, we derive cross-country, cross-era, and cross-category evaluations. Our framework combines standard automatic metrics, a culture-aware retrieval-augmented VQA, and expert human judgments collected from native reviewers. To enable reproducibility, we release the complete image corpus, prompts, and configurations. Our study reveals three findings: (1) under country-agnostic prompts, models default to Global-North, modern-leaning depictions that flatten cross-country distinctions; (2) iterative I2I editing erodes cultural fidelity even when conventional metrics remain flat or improve; and (3) I2I models apply superficial cues (palette shifts, generic props) rather than era-consistent, context-aware changes, often retaining source identity for Global-South targets. These results highlight that culture-sensitive edits remain unreliable in current systems. By releasing standardized data, prompts, and human evaluation protocols, we provide a reproducible, culture-centered benchmark for diagnosing and tracking cultural bias in generative image models.
Problem

Research questions and friction points this paper is trying to address.

Evaluating cultural bias in generative image models across six countries
Assessing cultural fidelity erosion during iterative image-to-image editing
Analyzing superficial cultural cues versus context-aware model changes
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified evaluation across six countries and eras
Combining automatic metrics with culture-aware VQA
Standardized reproducible benchmark for cultural bias