With a Grain of SALT: Are LLMs Fair Across Social Dimensions?

📅 2024-10-16

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

212K/year

🤖 AI Summary

This paper addresses systemic social biases—particularly along gender, religion, and race—in open-source large language models (LLMs). To this end, we propose SALT, a multi-scenario fairness evaluation framework. Methodologically, we construct the SALT benchmark: a curated dataset spanning five realistic scenarios; introduce the first decoupled assessment of bias, positional bias, and length bias; and integrate dual-track validation via DeepSeek-R1–based automated evaluation and anonymized human annotation. Our contributions are threefold: (1) the first reproducible, multidimensional, and extensible bias benchmark for open-source LLMs; (2) empirical evidence revealing cross-dimensional polarization and systematic assignment of negative roles in mainstream models (e.g., Llama, Gemma); and (3) novel quantitative tools—including win-rate statistics and negative-role detection—to advance fairness research with a rigorous, interpretable paradigm.

Technology Category

Application Category

📝 Abstract

This paper presents a systematic analysis of biases in open-source Large Language Models (LLMs), across gender, religion, and race. Our study evaluates bias in smaller-scale Llama and Gemma models using the SALT ($ extbf{S}$ocial $ extbf{A}$ppropriateness in $ extbf{L}$LM-Generated $ extbf{T}$ext) dataset, which incorporates five distinct bias triggers: General Debate, Positioned Debate, Career Advice, Problem Solving, and CV Generation. To quantify bias, we measure win rates in General Debate and the assignment of negative roles in Positioned Debate. For real-world use cases, such as Career Advice, Problem Solving, and CV Generation, we anonymize the outputs to remove explicit demographic identifiers and use DeepSeek-R1 as an automated evaluator. We also address inherent biases in LLM-based evaluation, including evaluation bias, positional bias, and length bias, and validate our results through human evaluations. Our findings reveal consistent polarization across models, with certain demographic groups receiving systematically favorable or unfavorable treatment. By introducing SALT, we provide a comprehensive benchmark for bias analysis and underscore the need for robust bias mitigation strategies in the development of equitable AI systems.

Problem

Research questions and friction points this paper is trying to address.

Analyzes biases in LLMs across social dimensions

Uses SALT dataset to evaluate bias triggers

Proposes strategies for equitable AI development

Innovation

Methods, ideas, or system contributions that make the work stand out.

SALT dataset for bias analysis

DeepSeek-R1 automated evaluator

Human validated bias mitigation strategies

🔎 Similar Papers

No similar papers found.