Exploring Causal Effect of Social Bias on Faithfulness Hallucinations in Large Language Models

📅 2025-08-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study empirically investigates— for the first time—whether social bias causally induces factual hallucinations (i.e., outputs inconsistent with input facts) in large language models (LLMs). Method: Grounded in structural causal modeling (SCM), the authors construct a causal graph to identify the causal pathway from bias states to hallucinations and design bias intervention techniques to control contextual confounding. They introduce BID, the first bias-intervention dataset tailored for causal analysis, enabling quantitative assessment of bias directionality and magnitude across dimensions (e.g., gender, race). Contribution/Results: Experiments across diverse state-of-the-art LLMs demonstrate that social bias significantly increases the probability of factual hallucinations—particularly unfair hallucinations—and that this causal effect is both model-agnostic and quantitatively measurable. The work establishes the first causal link between social bias and factual hallucination in LLMs and provides a reproducible causal analysis framework, along with publicly available resources (BID dataset and intervention protocols).

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) have achieved remarkable success in various tasks, yet they remain vulnerable to faithfulness hallucinations, where the output does not align with the input. In this study, we investigate whether social bias contributes to these hallucinations, a causal relationship that has not been explored. A key challenge is controlling confounders within the context, which complicates the isolation of causality between bias states and hallucinations. To address this, we utilize the Structural Causal Model (SCM) to establish and validate the causality and design bias interventions to control confounders. In addition, we develop the Bias Intervention Dataset (BID), which includes various social biases, enabling precise measurement of causal effects. Experiments on mainstream LLMs reveal that biases are significant causes of faithfulness hallucinations, and the effect of each bias state differs in direction. We further analyze the scope of these causal effects across various models, specifically focusing on unfairness hallucinations, which are primarily targeted by social bias, revealing the subtle yet significant causal effect of bias on hallucination generation.
Problem

Research questions and friction points this paper is trying to address.

Investigates if social bias causes faithfulness hallucinations in LLMs
Uses Structural Causal Model to isolate bias-hallucination causality
Analyzes varying directional effects of bias states on hallucinations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Utilizes Structural Causal Model for causality validation
Develops Bias Intervention Dataset for precise measurement
Analyzes bias effects across various LLM models
🔎 Similar Papers
No similar papers found.
Z
Zhenliang Zhang
Wangxuan Institute of Computer Technology, Peking University; School of Software and Microelectronics, Peking University
Junzhe Zhang
Junzhe Zhang
Syracuse University
Causal InferenceArtificial Intelligence
X
Xinyu Hu
Wangxuan Institute of Computer Technology, Peking University
Huixuan Zhang
Huixuan Zhang
Peking University
Natural Language Processing
Xiaojun Wan
Xiaojun Wan
Peking University
Natural Language ProcessingText MiningArtificial Intelligence