Homogeneity Bias as Differential Sampling Uncertainty in Language Models

📅 2025-01-31

📈 Citations: 0

✨ Influential: 0

career value

173K/year

🤖 AI Summary

This study identifies homogeneity bias—reduced lexical diversity—in large language models (LLMs) and vision-language models (VLMs) when describing marginalized groups (e.g., Black Americans, women). Method: We propose the first token-level uncertainty framework, quantifying bias across three dimensions: entropy, perplexity, and discriminative probability. Applying it to GPT-4 Turbo, Llama-3.2, and multimodal models, we analyze sampling behavior during inference. Contribution/Results: We find that LLMs exhibit systematically higher sampling certainty (lower entropy and perplexity) for marginalized groups, indicating reduced generative diversity—a previously uncharacterized source of representational bias. Crucially, this pattern does not generalize across all VLMs, revealing model-specific bias mechanisms. Our work is the first to attribute homogeneity bias to disparities in sampling uncertainty distributions, offering a novel theoretical lens and quantifiable methodology for diagnosing and mitigating representation bias in generative AI.

Technology Category

Application Category

📝 Abstract

Prior research show that Large Language Models (LLMs) and Vision-Language Models (VLMs) represent marginalized groups more homogeneously than dominant groups. However, the mechanisms underlying this homogeneity bias remain relatively unexplored. We propose that this bias emerges from systematic differences in the probability distributions from which tokens are sampled at inference-time. Analyzing three measures of uncertainty in token sampling distributions-entropy, perplexity, and probability of differentiation-we find that in some models, specifically GPT-4 Turbo and Llama-3.2, tokens are sampled more deterministically when generating texts about marginalized groups (i.e., Black Americans and women) compared to their dominant group counterparts (i.e., White Americans and men). While these findings may help explain homogeneity bias in certain models, the patterns did not replicate across all VLMs tested, suggesting multiple mechanisms may contribute to homogeneity bias in AI.

Problem

Research questions and friction points this paper is trying to address.

Language Models

Homogeneity Bias

Minority Representation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Homogeneity Bias

Language Models

Minority Representation

🔎 Similar Papers

No similar papers found.