๐ค AI Summary
Prior work on cultural bias in generative image models focuses predominantly on text-to-image (T2I) synthesis, overlooking cultural fidelity degradation in image-to-image (I2I) editing. Method: We introduce a novel, era-aware cultural evaluation framework spanning six countries, eight categories, and 36 subcategories across multiple historical periods. It unifies T2I and I2I assessment via temporally grounded prompts and integrates automated metrics, culturally aware retrieval-augmented visual question answering (VQA), and expert-led human evaluation by local domain specialists. Contribution/Results: This is the first cross-national, cross-epoch, and cross-category cultural bias benchmark for generative image models. Experiments reveal strong model preference for Global North modern aesthetics under nationality-agnostic prompts; I2I editing consistently erodes cultural authenticity and perpetuates stereotypical representations of Global South nations. We publicly release our dataset, prompt templates, and evaluation protocol to advance culturally sensitive image generation research.
๐ Abstract
Generative image models produce striking visuals yet often misrepresent culture. Prior work has examined cultural bias mainly in text-to-image (T2I) systems, leaving image-to-image (I2I) editors underexplored. We bridge this gap with a unified evaluation across six countries, an 8-category/36-subcategory schema, and era-aware prompts, auditing both T2I generation and I2I editing under a standardized protocol that yields comparable diagnostics. Using open models with fixed settings, we derive cross-country, cross-era, and cross-category evaluations. Our framework combines standard automatic metrics, a culture-aware retrieval-augmented VQA, and expert human judgments collected from native reviewers. To enable reproducibility, we release the complete image corpus, prompts, and configurations. Our study reveals three findings: (1) under country-agnostic prompts, models default to Global-North, modern-leaning depictions that flatten cross-country distinctions; (2) iterative I2I editing erodes cultural fidelity even when conventional metrics remain flat or improve; and (3) I2I models apply superficial cues (palette shifts, generic props) rather than era-consistent, context-aware changes, often retaining source identity for Global-South targets. These results highlight that culture-sensitive edits remain unreliable in current systems. By releasing standardized data, prompts, and human evaluation protocols, we provide a reproducible, culture-centered benchmark for diagnosing and tracking cultural bias in generative image models.