Prompting Away Stereotypes? Evaluating Bias in Text-to-Image Models for Occupations

📅 2025-08-31
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study systematically evaluates social bias—particularly gender and racial stereotypes—in text-to-image (TTI) models’ occupational depictions. To this end, we construct a benchmark dataset covering five occupational categories and propose a fairness-aware prompt engineering framework. We conduct the first systematic comparison of five leading TTI models—DALL·E 3, Gemini Imagen 4.0, FLUX.1-dev, Stable Diffusion XL Turbo, and Grok-2 Image—on their responsiveness to diversity-promoting prompts. Human annotation analysis reveals that prompts significantly modulate demographic representation distributions; however, efficacy is highly model-dependent: some models achieve effective diversification, others exhibit overcorrection or negligible response. Our findings demonstrate the potential—and inherent limitations—of prompt engineering as a lightweight bias mitigation strategy, underscoring the necessity of co-designing prompt interventions with architectural improvements to achieve robust fairness in generative vision-language systems.

Technology Category

Application Category

📝 Abstract
Text-to-Image (TTI) models are powerful creative tools but risk amplifying harmful social biases. We frame representational societal bias assessment as an image curation and evaluation task and introduce a pilot benchmark of occupational portrayals spanning five socially salient roles (CEO, Nurse, Software Engineer, Teacher, Athlete). Using five state-of-the-art models: closed-source (DALLE 3, Gemini Imagen 4.0) and open-source (FLUX.1-dev, Stable Diffusion XL Turbo, Grok-2 Image), we compare neutral baseline prompts against fairness-aware controlled prompts designed to encourage demographic diversity. All outputs are annotated for gender (male, female) and race (Asian, Black, White), enabling structured distributional analysis. Results show that prompting can substantially shift demographic representations, but with highly model-specific effects: some systems diversify effectively, others overcorrect into unrealistic uniformity, and some show little responsiveness. These findings highlight both the promise and the limitations of prompting as a fairness intervention, underscoring the need for complementary model-level strategies. We release all code and data for transparency and reproducibility https://github.com/maximus-powers/img-gen-bias-analysis.
Problem

Research questions and friction points this paper is trying to address.

Evaluating occupational bias in text-to-image models
Assessing demographic diversity in AI-generated occupational portrayals
Testing prompting effectiveness for reducing model stereotypes
Innovation

Methods, ideas, or system contributions that make the work stand out.

Benchmarking occupational portrayals across models
Using fairness-aware prompts to encourage diversity
Annotating outputs for gender and race analysis
🔎 Similar Papers
No similar papers found.