Prejudice and Volatility: A Statistical Framework for Measuring Social Discrimination in Large Language Models

๐Ÿ“… 2024-02-23
๐Ÿ“ˆ Citations: 1
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Existing LLM alignment evaluations often overlook stereotype randomness induced by generation inconsistency, leading to biased risk misestimation. This paper proposes the Prejudice-Volatility framework, the first to decouple discrimination risk into two quantifiable dimensions: systematic prejudice and generation volatility. Leveraging token-level probability distributions, contextual modeling, and statistical risk metrics, it enables unified, cross-model and cross-task assessment of diverse inductive biasesโ€”including knowledge and social biases. Experiments across 12 mainstream LLMs reveal: (1) prejudice dominates overall discrimination risk; (2) pervasive pro-male occupational stereotyping exists; (3) RLHF reduces prejudice but increases volatility; and (4) discrimination severity significantly correlates with socioeconomic indicators such as occupational salary. The framework thus provides a principled, fine-grained, and empirically grounded approach to bias evaluation in generative language models.

Technology Category

Application Category

๐Ÿ“ Abstract
This study investigates why and how inconsistency in the generation of Large Language Models (LLMs) might induce or exacerbate societal injustice. For instance, LLMs frequently exhibit contrasting gender stereotypes regarding the same career depending on varied contexts, highlighting the arguably harmful unpredictability of LLMs' behavioral patterns. To augment the existing discrimination assessment with the capability to account for variation in LLM generation, we formulate the Prejudice-Volatility Framework (PVF) that precisely defines behavioral metrics for assessing LLMs, which delineate the probability distribution of LLMs' stereotypes from the perspective of token prediction probability. Specifically, we employ a data-mining approach to approximate the possible applied contexts of LLMs and devise statistical metrics to evaluate the corresponding contextualized societal discrimination risk. Further, we mathematically dissect the aggregated discrimination risk of LLMs into prejudice risk, originating from their system bias, and volatility risk, stemming from their generation inconsistency. While initially intended for assessing discrimination in LLMs, our proposed PVF facilitates the comprehensive and flexible measurement of any inductive biases, including knowledge alongside prejudice, across various modality models. We apply PVF to 12 most commonly adopted LLMs and compare their risk levels. Our findings reveal that: i) prejudice risk is the primary cause of discrimination risk in LLMs, indicating that inherent biases in these models lead to stereotypical outputs; ii) most LLMs exhibit significant pro-male stereotypes across nearly all careers; iii) alignment with Reinforcement Learning from Human Feedback lowers discrimination by reducing prejudice, but increases volatility; iv) discrimination risk in LLMs correlates with socio-economic factors like profession salaries.
Problem

Research questions and friction points this paper is trying to address.

Analyzing stereotypes in LLMs via bias and variation estimation
Quantifying discrimination risk from bias and volatility in LLM outputs
Assessing impact of generation inconsistencies on LLM behavior
Innovation

Methods, ideas, or system contributions that make the work stand out.

Novel statistical framework for LLM stereotypes
Bias-Volatility Framework quantifies discrimination risk
Decomposes risk into bias and volatility components
๐Ÿ”Ž Similar Papers
No similar papers found.
Y
Yiran Liu
Tsinghua University
K
Ke Yang
University of Illinois Urbana-Champaign
Zehan Qi
Zehan Qi
Tsinghua University
X
Xiao Liu
Tsinghua University
Y
Yang Yu
Tsinghua University
C
Chengxiang Zhai
University of Illinois Urbana-Champaign