🤖 AI Summary
This study investigates linguistic agency bias in large language model (LLM) outputs—specifically, disparities in the attribution of agency (e.g., action-oriented, agentic language) across gender, race, and intersectional identity groups. To address the lack of dedicated evaluation resources, we introduce LABE, the first benchmark explicitly designed for agency bias assessment, covering biographical generation, professor evaluations, and recommendation letters. We empirically demonstrate, for the first time, that LLMs exhibit significantly higher agency bias along intersectional dimensions than human-written texts. To mitigate this, we propose Mitigated Selective Rewriting (MSR), a classifier-guided, fine-grained, and controllable debiasing method that selectively rewrites biased spans. Experiments on ChatGPT, Llama-3, and Mistral show MSR reduces agency bias by 37.2% on average over prompt-engineering baselines, improves output stability by 3.1×, and preserves generation quality. This work establishes a new paradigm for quantifying and precisely mitigating linguistic agency bias in LLMs.
📝 Abstract
Social biases can manifest in language agency. However, very limited research has investigated such biases in Large Language Model (LLM)-generated content. In addition, previous works often rely on string-matching techniques to identify agentic and communal words within texts, falling short of accurately classifying language agency. We introduce the Language Agency Bias Evaluation (LABE) benchmark, which comprehensively evaluates biases in LLMs by analyzing agency levels attributed to different demographic groups in model generations. LABE tests for gender, racial, and intersectional language agency biases in LLMs on 3 text generation tasks: biographies, professor reviews, and reference letters. Using LABE, we unveil language agency social biases in 3 recent LLMs: ChatGPT, Llama3, and Mistral. We observe that: (1) LLM generations tend to demonstrate greater gender bias than human-written texts; (2) Models demonstrate remarkably higher levels of intersectional bias than the other bias aspects. (3) Prompt-based mitigation is unstable and frequently leads to bias exacerbation. Based on our observations, we propose Mitigation via Selective Rewrite (MSR), a novel bias mitigation strategy that leverages an agency classifier to identify and selectively revise parts of generated texts that demonstrate communal traits. Empirical results prove MSR to be more effective and reliable than prompt-based mitigation method, showing a promising research direction.