BiasFreeBench: a Benchmark for Mitigating Bias in Large Language Model Responses

📅 2025-09-30

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

Existing LLM debiasing evaluation methods lack standardized benchmarks and predominantly rely on probability-difference metrics, which misalign with real-world interactive scenarios where users evaluate model *responses*, not token probabilities. Method: We propose BiasFreeBench—the first response-level debiasing benchmark for both multiple-choice and open-domain multi-turn question answering—constructed by systematically re-annotating and restructuring existing datasets to enable fine-grained fairness assessment. It introduces the novel response-level metric, Bias-Free Score, and rigorously evaluates eight state-of-the-art prompting- and training-based debiasing methods across model scales, generalization capabilities, and diverse bias dimensions (e.g., gender, race, religion). Contribution/Results: Our experiments reveal substantial performance disparities among methods across bias types, uncovering critical limitations in current approaches and significantly narrowing the gap between evaluation and practical deployment. BiasFreeBench will be publicly released to advance standardization and reproducibility in LLM debiasing research.

Technology Category

Application Category

📝 Abstract

Existing studies on bias mitigation methods for large language models (LLMs) use diverse baselines and metrics to evaluate debiasing performance, leading to inconsistent comparisons among them. Moreover, their evaluations are mostly based on the comparison between LLMs' probabilities of biased and unbiased contexts, which ignores the gap between such evaluations and real-world use cases where users interact with LLMs by reading model responses and expect fair and safe outputs rather than LLMs' probabilities. To enable consistent evaluation across debiasing methods and bridge this gap, we introduce BiasFreeBench, an empirical benchmark that comprehensively compares eight mainstream bias mitigation techniques (covering four prompting-based and four training-based methods) on two test scenarios (multi-choice QA and open-ended multi-turn QA) by reorganizing existing datasets into a unified query-response setting. We further introduce a response-level metric, Bias-Free Score, to measure the extent to which LLM responses are fair, safe, and anti-stereotypical. Debiasing performances are systematically compared and analyzed across key dimensions: the prompting vs. training paradigm, model size, and generalization of different training strategies to unseen bias types. We will publicly release our benchmark, aiming to establish a unified testbed for bias mitigation research.

Problem

Research questions and friction points this paper is trying to address.

Benchmark addresses inconsistent bias mitigation evaluations in LLMs

Bridges gap between probability-based metrics and real-world user interactions

Systematically compares prompting and training debiasing methods across scenarios

Innovation

Methods, ideas, or system contributions that make the work stand out.

Benchmark standardizes bias mitigation methods comparison

Introduces response-level metric for fairness evaluation

Systematically analyzes debiasing across key model dimensions

🔎 Similar Papers

LangBiTe: A Platform for Testing Bias in Large Language Models

2024-04-29arXiv.orgCitations: 2