🤖 AI Summary
Prior work lacks cross-model comparability in quantifying bias across large language models (LLMs), hindering fair evaluation. Method: We systematically quantify bias similarity across 13 LLMs from five families, using a dataset of 4K human-annotated and 1M synthetically generated questions, coupled with large-scale prompt-response analysis and a rigorous cross-model comparative experimental framework. Contributions/Results: (1) Fine-tuning exerts minimal impact on bias; (2) closed-source models disproportionately adopt refusal strategies to suppress bias—sacrificing utility; (3) open-source models (e.g., Llama3-Chat, Gemma2-it) achieve fairness parity with GPT-4; (4) “unknown response” policies exacerbate the utility–fairness trade-off; (5) disambiguation-induced bias score extremization may trigger reverse discrimination. Our findings challenge the prevailing “closed-source = fairer” assumption and advocate for a redefined evaluation paradigm that jointly optimizes fairness and practical utility.
📝 Abstract
Bias in machine learning models, particularly in Large Language Models, is a critical issue as these systems shape important societal decisions. While previous studies have examined bias in individual LLMs, comparisons of bias across models remain underexplored. To address this gap, we analyze 13 LLMs from five families, evaluating bias through output distribution across multiple dimensions using two datasets (4K and 1M questions). Our results show that fine-tuning has minimal impact on output distributions, and proprietary models tend to overly response as unknowns to minimize bias, compromising accuracy and utility. In addition, open-source models like Llama3-Chat and Gemma2-it demonstrate fairness comparable to proprietary models like GPT-4, challenging the assumption that larger, closed-source models are inherently less biased. We also find that bias scores for disambiguated questions are more extreme, raising concerns about reverse discrimination. These findings highlight the need for improved bias mitigation strategies and more comprehensive evaluation metrics for fairness in LLMs.