How Does Differential Privacy Affect Social Bias in LLMs? A Systematic Evaluation

📅 2026-05-11

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

This study addresses the lack of systematic understanding regarding how differential privacy (DP) affects social bias in large language models (LLMs). The authors train LLMs using DP-SGD and introduce a multi-paradigm fairness evaluation framework encompassing sentence scoring, text completion, tabular classification, and question answering. Their analysis reveals, for the first time, that DP’s impact on bias is task-dependent: while it effectively reduces bias in sentence scoring, its effects are inconsistent across other tasks. The work further uncovers discrepancies between bias manifestations in the logit and output layers and demonstrates that reducing memorization does not necessarily mitigate unfairness. These findings underscore the necessity of comprehensive, multi-paradigm evaluation to properly assess fairness in LLMs under DP constraints.

📝 Abstract

Large language models (LLMs) trained on web-scale corpora can memorize sensitive training data, posing significant privacy risks. Differential privacy (DP) has emerged as a principled framework that limits the influence of individual data points during training, yet the relationship between differential privacy and social bias in LLMs remains poorly understood. To investigate this, we present a systematic evaluation of social bias in a pretrained LLM trained with DP-SGD, comparing a DP model against non-DP baselines across four complementary paradigms: sentence scoring, text completion, tabular classification, and question answering. We find that DP reduces bias in sentence scoring tasks, where bias is measured through controlled likelihood comparisons, yet this improvement does not generalize across all tasks. Our results reveal a discrepancy between logit-level bias and output-level bias. Moreover, decreasing memorization does not necessarily reduce unfairness, underscoring the importance of multi-paradigm evaluation when assessing fairness in LLMs.

Problem

Research questions and friction points this paper is trying to address.

differential privacy

social bias

large language models

fairness

memorization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Differential Privacy

Social Bias

Large Language Models