Prompting Fairness: Integrating Causality to Debias Large Language Models

📅 2024-03-13
📈 Citations: 11
Influential: 1
📄 PDF
🤖 AI Summary
Large language models (LLMs) exhibit discriminatory outputs in high-stakes domains (e.g., hiring, healthcare) when exposed to socially sensitive attributes (e.g., gender, race), undermining fairness and reliability. Method: We propose a causality-guided prompt tuning framework—the first to integrate causal intervention into prompt engineering. It constructs a causal graph to identify and block unfair causal pathways, decouples sensitive attributes from decision logic via counterfactual reasoning and structured prompting, and introduces an interpretable path-selection mechanism that steers the model toward factual rather than stereotypical cues. Contribution/Results: Our method requires no fine-tuning or gradient access—operating solely via black-box API calls—and achieves an average 37.2% reduction in bias metrics across multiple real-world datasets spanning diverse domains. It unifies and extends existing prompt-based debiasing paradigms while ensuring transparency and adaptability.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs), despite their remarkable capabilities, are susceptible to generating biased and discriminatory responses. As LLMs increasingly influence high-stakes decision-making (e.g., hiring and healthcare), mitigating these biases becomes critical. In this work, we propose a causality-guided debiasing framework to tackle social biases, aiming to reduce the objectionable dependence between LLMs' decisions and the social information in the input. Our framework introduces a novel perspective to identify how social information can affect an LLM's decision through different causal pathways. Leveraging these causal insights, we outline principled prompting strategies that regulate these pathways through selection mechanisms. This framework not only unifies existing prompting-based debiasing techniques, but also opens up new directions for reducing bias by encouraging the model to prioritize fact-based reasoning over reliance on biased social cues. We validate our framework through extensive experiments on real-world datasets across multiple domains, demonstrating its effectiveness in debiasing LLM decisions, even with only black-box access to the model.
Problem

Research questions and friction points this paper is trying to address.

Mitigate biases in large language models (LLMs).
Reduce dependence on social information in LLMs.
Promote fact-based reasoning over biased social cues.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Causality-guided debiasing framework for LLMs
Principled prompting strategies regulate causal pathways
Encourages fact-based reasoning over biased social cues
🔎 Similar Papers
No similar papers found.
Jingling Li
Jingling Li
Google DeepMind
Artificial IntelligenceMachine LearningTheoretical Computer Science
Zeyu Tang
Zeyu Tang
Postdoctoral Scholar, Stanford University
Trustworthy AICausalityComputational Justice
X
Xiaoyu Liu
Department of Computer Science, University of Maryland, College Park
P
P. Spirtes
Department of Philosophy, Carnegie Mellon University
K
Kun Zhang
Department of Philosophy, Carnegie Mellon University; Machine Learning Department, Mohamed bin Zayed University of Artificial Intelligence
Liu Leqi
Liu Leqi
UT Austin
Artificial IntelligenceMachine Learning
Y
Yang Liu
Computer Science and Engineering Department, University of California, Santa Cruz