The Impact of Unstated Norms in Bias Analysis of Language Models

📅 2024-04-04

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

178K/year

🤖 AI Summary

This paper identifies a systematic mismatch between mainstream template-based counterfactual bias detection methods—which explicitly annotate group identities (e.g., “Black president” vs. “White president”)—and implicit linguistic norms (e.g., markedness: *Black president* is marked, while *president* defaults to White) inherent in pretraining corpora. This mismatch distorts bias measurement, notably inflating perceived negativity toward White-associated texts. Through controlled experiments, formal modeling of linguistic norms, and empirical evaluation, the study is the first to uncover how unannotated norms—particularly markedness—confound bias quantification, thereby challenging the validity of templated evaluation. Its contributions are threefold: (1) establishing implicit linguistic norms as critical confounding variables in bias measurement; (2) proposing a more natural, minimally interventionist evaluation paradigm; and (3) providing theoretical foundations and concrete pathways for developing robust, norm-aware bias quantification methods.

Technology Category

Application Category

📝 Abstract

Bias in large language models (LLMs) has many forms, from overt discrimination to implicit stereotypes. Counterfactual bias evaluation is a widely used approach to quantifying bias and often relies on template-based probes that explicitly state group membership. It measures whether the outcome of a task performed by an LLM is invariant to a change in group membership. In this work, we find that template-based probes can lead to unrealistic bias measurements. For example, LLMs appear to mistakenly cast text associated with White race as negative at higher rates than other groups. We hypothesize that this arises artificially via a mismatch between commonly unstated norms, in the form of markedness, in the pretraining text of LLMs (e.g., Black president vs. president) and templates used for bias measurement (e.g., Black president vs. White president). The findings highlight the potential misleading impact of varying group membership through explicit mention in counterfactual bias quantification.

Problem

Research questions and friction points this paper is trying to address.

Unstated norms affect bias analysis

Template-based probes yield unrealistic bias measurements

Mismatch between pretraining norms and bias templates

Innovation

Methods, ideas, or system contributions that make the work stand out.

Counterfactual bias evaluation

Template-based probes

Unstated norms mismatch

🔎 Similar Papers

From Prejudice to Parity: A New Approach to Debiasing Large Language Model Word Embeddings