Does Liking Yellow Imply Driving a School Bus? Semantic Leakage in Language Models

📅 2024-08-12

🏛️ arXiv.org

📈 Citations: 5

✨ Influential: 1

career value

202K/year

🤖 AI Summary

This paper identifies and systematically investigates “semantic leakage” in language models—a phenomenon wherein models erroneously associate semantically irrelevant cues from prompts (e.g., color preferences) with generated outputs (e.g., occupational roles), yielding non-causal, misleading predictions. Method: The authors formally define this bias, construct a multilingual, multi-scenario evaluation framework, and curate diverse diagnostic datasets. They employ controlled prompt perturbations, cross-lingual and cross-generation-mode (zero-shot, few-shot, conversational) consistency analysis, and human-AI collaborative evaluation to assess prevalence. Contribution/Results: Experiments across 13 mainstream large language models confirm semantic leakage is robust across languages and generalizes across prompting paradigms. The work introduces a novel dimension for assessing model trustworthiness and provides a reproducible benchmark toolkit for diagnosing and mitigating such spurious associations.

Technology Category

Application Category

📝 Abstract

Despite their wide adoption, the biases and unintended behaviors of language models remain poorly understood. In this paper, we identify and characterize a phenomenon never discussed before, which we call semantic leakage, where models leak irrelevant information from the prompt into the generation in unexpected ways. We propose an evaluation setting to detect semantic leakage both by humans and automatically, curate a diverse test suite for diagnosing this behavior, and measure significant semantic leakage in 13 flagship models. We also show that models exhibit semantic leakage in languages besides English and across different settings and generation scenarios. This discovery highlights yet another type of bias in language models that affects their generation patterns and behavior.

Problem

Research questions and friction points this paper is trying to address.

Identifying semantic leakage in language models

Evaluating leakage detection methods for models

Measuring leakage across languages and scenarios

Innovation

Methods, ideas, or system contributions that make the work stand out.

Detects semantic leakage via human and automated evaluation

Curates diverse test suite for diagnosing model behavior

Measures leakage in 13 models across languages and settings

🔎 Similar Papers

No similar papers found.