Evaluating LLMs for Demographic-Targeted Social Bias Detection: A Comprehensive Benchmark Study

📅 2025-10-06

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

Prior work lacks systematic evaluation of large language models’ (LLMs) capability to detect social biases—particularly across multiple demographic groups, diverse content types (e.g., hate speech, stereotypes), and intersectional bias configurations. Method: We introduce the first multi-label bias detection benchmark for English text, grounded in a fine-grained demographic taxonomy and spanning 12 heterogeneous datasets, multiple content categories, and cross-cutting demographic axes (e.g., gender × race). We conduct comprehensive benchmarking across LLMs of varying scales and architectures using prompt engineering, in-context learning, and supervised fine-tuning. Contribution/Results: Fine-tuned smaller models outperform larger ones in scalability and generalization; however, all models exhibit substantial performance gaps in detecting intersectional biases. This work establishes a standardized evaluation paradigm and delivers critical empirical insights for bias detection research.

Technology Category

Application Category

📝 Abstract

Large-scale web-scraped text corpora used to train general-purpose AI models often contain harmful demographic-targeted social biases, creating a regulatory need for data auditing and developing scalable bias-detection methods. Although prior work has investigated biases in text datasets and related detection methods, these studies remain narrow in scope. They typically focus on a single content type (e.g., hate speech), cover limited demographic axes, overlook biases affecting multiple demographics simultaneously, and analyze limited techniques. Consequently, practitioners lack a holistic understanding of the strengths and limitations of recent large language models (LLMs) for automated bias detection. In this study, we present a comprehensive evaluation framework aimed at English texts to assess the ability of LLMs in detecting demographic-targeted social biases. To align with regulatory requirements, we frame bias detection as a multi-label task using a demographic-focused taxonomy. We then conduct a systematic evaluation with models across scales and techniques, including prompting, in-context learning, and fine-tuning. Using twelve datasets spanning diverse content types and demographics, our study demonstrates the promise of fine-tuned smaller models for scalable detection. However, our analyses also expose persistent gaps across demographic axes and multi-demographic targeted biases, underscoring the need for more effective and scalable auditing frameworks.

Problem

Research questions and friction points this paper is trying to address.

Evaluating LLMs for detecting demographic-targeted social biases in texts

Assessing bias detection across multiple demographic axes and content types

Identifying gaps in current methods for scalable social bias auditing

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-label bias detection with demographic taxonomy

Systematic evaluation across model scales and techniques

Fine-tuned smaller models for scalable detection

🔎 Similar Papers

Human and LLM Biases in Hate Speech Annotations: A Socio-Demographic Analysis of Annotators and Targets