Mitigation of Gender and Ethnicity Bias in AI-Generated Stories through Model Explanations

📅 2025-09-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses gender and racial representation biases in AI-generated narratives within occupational contexts. We propose BAME, a model-interpretability-based bias analysis and mitigation framework. Unlike parameter fine-tuning or data resampling, BAME uniquely leverages causal explanations self-generated by large language models (Claude 3.5 Sonnet, Llama 3.1 70B Instruct, GPT-4 Turbo) to guide targeted prompt engineering—achieving bias mitigation without altering model parameters. Our methodology comprises multidimensional bias measurement, interpretability-driven prompt optimization, and systematic cross-model and cross-occupation evaluation. Experiments across 25 occupational domains demonstrate that BAME significantly improves demographic fairness in story generation: representation balance increases by 2–20% across metrics. The framework exhibits strong effectiveness, generalizability across models and occupations, and methodological transparency—offering a parameter-free, interpretable, and scalable approach to mitigating representational bias in generative AI storytelling.

Technology Category

Application Category

📝 Abstract
Language models have been shown to propagate social bias through their output, particularly in the representation of gender and ethnicity. This paper investigates gender and ethnicity biases in AI-generated occupational stories. Representation biases are measured before and after applying our proposed mitigation strategy, Bias Analysis and Mitigation through Explanation (BAME), revealing improvements in demographic representation ranging from 2% to 20%. BAME leverages model-generated explanations to inform targeted prompt engineering, effectively reducing biases without modifying model parameters. By analyzing stories generated across 25 occupational groups, three large language models (Claude 3.5 Sonnet, Llama 3.1 70B Instruct, and GPT-4 Turbo), and multiple demographic dimensions, we identify persistent patterns of overrepresentation and underrepresentation linked to training data stereotypes. Our findings demonstrate that guiding models with their own internal reasoning mechanisms can significantly enhance demographic parity, thereby contributing to the development of more transparent generative AI systems.
Problem

Research questions and friction points this paper is trying to address.

Mitigating gender and ethnicity bias in AI-generated stories
Reducing representation bias through model explanation techniques
Addressing demographic stereotypes in language model outputs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leveraging model explanations for bias mitigation
Targeted prompt engineering without parameter modification
Using internal reasoning to enhance demographic parity
🔎 Similar Papers