Biases in Large Language Model-Elicited Text: A Case Study in Natural Language Inference

📅 2025-03-06

🏛️ International Conference on Computational Linguistics

📈 Citations: 0

✨ Influential: 0

career value

167K/year

🤖 AI Summary

This work investigates whether natural language inference (NLI) data generated by large language models (LLMs)—specifically GPT-4, Llama-2-70b, and Mistral-7b—inherit annotation artifacts and societal biases (e.g., gender, race, age) present in human-annotated NLI datasets. We construct LLM-generated NLI subsets and empirically identify severe hypothesis exclusivity bias and stereotypical social biases—first such evidence in synthetic NLI data. To detect these biases systematically, we propose a dual-path framework: (1) a fine-tuned BERT-based hypothesis exclusivity classifier, and (2) pointwise mutual information (PMI) analysis for bias-associated lexical patterns. Experiments show the framework achieves 86–96% classification accuracy on LLM-generated data—significantly outperforming its performance on human-annotated data—and quantitatively identifies multiple bias-correlated lexical terms. Our findings provide both novel methodology and critical empirical evidence for bias assessment and mitigation in LLM-synthesized training data.

Technology Category

Application Category

📝 Abstract

We test whether NLP datasets created with Large Language Models (LLMs) contain annotation artifacts and social biases like NLP datasets elicited from crowd-source workers. We recreate a portion of the Stanford Natural Language Inference corpus using GPT-4, Llama-2 70b for Chat, and Mistral 7b Instruct. We train hypothesis-only classifiers to determine whether LLM-elicited NLI datasets contain annotation artifacts. Next, we use pointwise mutual information to identify the words in each dataset that are associated with gender, race, and age-related terms. On our LLM-generated NLI datasets, fine-tuned BERT hypothesis-only classifiers achieve between 86-96% accuracy. Our analyses further characterize the annotation artifacts and stereotypical biases in LLM-generated datasets.

Problem

Research questions and friction points this paper is trying to address.

Detect annotation artifacts in LLM-generated NLP datasets.

Identify social biases like gender, race, and age in datasets.

Evaluate accuracy of hypothesis-only classifiers on LLM-elicited data.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Recreated NLI corpus using GPT-4, Llama-2, Mistral

Trained hypothesis-only classifiers for artifact detection

Used PMI to identify gender, race, age biases

🔎 Similar Papers

No similar papers found.