Actions Speak Louder than Words: Agent Decisions Reveal Implicit Biases in Language Models

📅 2025-01-29

📈 Citations: 0

✨ Influential: 0

career value

212K/year

🤖 AI Summary

This study reveals that large language models (LLMs) exhibit significantly amplified implicit social biases—despite reductions in explicit bias, implicit bias increases by an average of 37% across state-of-the-art models, aligning closely with real-world societal disparities. Method: We introduce a novel, socially grounded persona framework for LLM agents, explicitly modeling demographic attributes (e.g., gender, age, geography) to enable interpretable and reproducible behavioral auditing. We conduct multi-dimensional simulations and quantitative analysis across six mainstream LLMs, three demographic dimensions, and four decision-making scenarios. Contribution/Results: Our work provides the first empirical evidence that implicit bias does not attenuate with model advancement; instead, it systematically intensifies under zero-shot, unprompted conditions. The proposed framework establishes a new methodological foundation and critical benchmark for fairness evaluation in LLMs, enabling granular, behavior-level auditability beyond conventional token- or output-level metrics.

Technology Category

Application Category

📝 Abstract

While advances in fairness and alignment have helped mitigate overt biases exhibited by large language models (LLMs) when explicitly prompted, we hypothesize that these models may still exhibit implicit biases when simulating human behavior. To test this hypothesis, we propose a technique to systematically uncover such biases across a broad range of sociodemographic categories by assessing decision-making disparities among agents with LLM-generated, sociodemographically-informed personas. Using our technique, we tested six LLMs across three sociodemographic groups and four decision-making scenarios. Our results show that state-of-the-art LLMs exhibit significant sociodemographic disparities in nearly all simulations, with more advanced models exhibiting greater implicit biases despite reducing explicit biases. Furthermore, when comparing our findings to real-world disparities reported in empirical studies, we find that the biases we uncovered are directionally aligned but markedly amplified. This directional alignment highlights the utility of our technique in uncovering systematic biases in LLMs rather than random variations; moreover, the presence and amplification of implicit biases emphasizes the need for novel strategies to address these biases.

Problem

Research questions and friction points this paper is trying to address.

Implicit Bias

Large Language Models

Bias Mitigation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Implicit Bias

Language Models

Fairness Evaluation

🔎 Similar Papers

AI AI Bias: Large Language Models Favor Their Own Generated Content