Mitigation of Gender and Ethnicity Bias in AI-Generated Stories through Model Explanations

📅 2025-09-02

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

This study addresses gender and racial representation biases in AI-generated narratives within occupational contexts. We propose BAME, a model-interpretability-based bias analysis and mitigation framework. Unlike parameter fine-tuning or data resampling, BAME uniquely leverages causal explanations self-generated by large language models (Claude 3.5 Sonnet, Llama 3.1 70B Instruct, GPT-4 Turbo) to guide targeted prompt engineering—achieving bias mitigation without altering model parameters. Our methodology comprises multidimensional bias measurement, interpretability-driven prompt optimization, and systematic cross-model and cross-occupation evaluation. Experiments across 25 occupational domains demonstrate that BAME significantly improves demographic fairness in story generation: representation balance increases by 2–20% across metrics. The framework exhibits strong effectiveness, generalizability across models and occupations, and methodological transparency—offering a parameter-free, interpretable, and scalable approach to mitigating representational bias in generative AI storytelling.

Technology Category

Application Category

📝 Abstract

Language models have been shown to propagate social bias through their output, particularly in the representation of gender and ethnicity. This paper investigates gender and ethnicity biases in AI-generated occupational stories. Representation biases are measured before and after applying our proposed mitigation strategy, Bias Analysis and Mitigation through Explanation (BAME), revealing improvements in demographic representation ranging from 2% to 20%. BAME leverages model-generated explanations to inform targeted prompt engineering, effectively reducing biases without modifying model parameters. By analyzing stories generated across 25 occupational groups, three large language models (Claude 3.5 Sonnet, Llama 3.1 70B Instruct, and GPT-4 Turbo), and multiple demographic dimensions, we identify persistent patterns of overrepresentation and underrepresentation linked to training data stereotypes. Our findings demonstrate that guiding models with their own internal reasoning mechanisms can significantly enhance demographic parity, thereby contributing to the development of more transparent generative AI systems.

Problem

Research questions and friction points this paper is trying to address.

Mitigating gender and ethnicity bias in AI-generated stories

Reducing representation bias through model explanation techniques

Addressing demographic stereotypes in language model outputs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Leveraging model explanations for bias mitigation

Targeted prompt engineering without parameter modification

Using internal reasoning to enhance demographic parity

🔎 Similar Papers

What's in a Name? Auditing Large Language Models for Race and Gender Bias