Bias Similarity Across Large Language Models

📅 2024-10-15

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

Prior work lacks cross-model comparability in quantifying bias across large language models (LLMs), hindering fair evaluation. Method: We systematically quantify bias similarity across 13 LLMs from five families, using a dataset of 4K human-annotated and 1M synthetically generated questions, coupled with large-scale prompt-response analysis and a rigorous cross-model comparative experimental framework. Contributions/Results: (1) Fine-tuning exerts minimal impact on bias; (2) closed-source models disproportionately adopt refusal strategies to suppress bias—sacrificing utility; (3) open-source models (e.g., Llama3-Chat, Gemma2-it) achieve fairness parity with GPT-4; (4) “unknown response” policies exacerbate the utility–fairness trade-off; (5) disambiguation-induced bias score extremization may trigger reverse discrimination. Our findings challenge the prevailing “closed-source = fairer” assumption and advocate for a redefined evaluation paradigm that jointly optimizes fairness and practical utility.

Technology Category

Application Category

📝 Abstract

Bias in machine learning models, particularly in Large Language Models, is a critical issue as these systems shape important societal decisions. While previous studies have examined bias in individual LLMs, comparisons of bias across models remain underexplored. To address this gap, we analyze 13 LLMs from five families, evaluating bias through output distribution across multiple dimensions using two datasets (4K and 1M questions). Our results show that fine-tuning has minimal impact on output distributions, and proprietary models tend to overly response as unknowns to minimize bias, compromising accuracy and utility. In addition, open-source models like Llama3-Chat and Gemma2-it demonstrate fairness comparable to proprietary models like GPT-4, challenging the assumption that larger, closed-source models are inherently less biased. We also find that bias scores for disambiguated questions are more extreme, raising concerns about reverse discrimination. These findings highlight the need for improved bias mitigation strategies and more comprehensive evaluation metrics for fairness in LLMs.

Problem

Research questions and friction points this paper is trying to address.

Analyze bias similarity across 13 LLMs

Evaluate bias impact of fine-tuning and proprietary models

Assess fairness in open-source vs closed-source LLMs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Analyzed 13 LLMs across five families

Used two datasets for bias evaluation

Compared open-source and proprietary model fairness

🔎 Similar Papers

From Prejudice to Parity: A New Approach to Debiasing Large Language Model Word Embeddings