Algorithmic Fairness in NLP: Persona-Infused LLMs for Human-Centric Hate Speech Detection

📅 2025-10-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses algorithmic bias in hate speech detection arising from annotator–target group identity mismatches. We propose a fairness optimization method grounded in persona modeling, pioneering the integration of social-psychological group identity theory into NLP. Specifically, we construct personalized large language models (Persona-LLMs) that explicitly incorporate annotators’ sociodemographic attributes (e.g., gender, race, religion) via shallow persona prompting and RAG-enhanced deep contextualized persona modeling. Experiments on Gemini and GPT-4.1-mini demonstrate significant improvements in cross-group fairness—particularly reduced false positives on minority-group texts—validating the efficacy and practical boundaries of persona-based modeling for mitigating identity-related bias. Our core contribution is a novel, interpretable, and controllable identity-aware detection paradigm, offering both theoretical insights and a technical framework for fair NLP.

Technology Category

Application Category

📝 Abstract
In this paper, we investigate how personalising Large Language Models (Persona-LLMs) with annotator personas affects their sensitivity to hate speech, particularly regarding biases linked to shared or differing identities between annotators and targets. To this end, we employ Google's Gemini and OpenAI's GPT-4.1-mini models and two persona-prompting methods: shallow persona prompting and a deeply contextualised persona development based on Retrieval-Augmented Generation (RAG) to incorporate richer persona profiles. We analyse the impact of using in-group and out-group annotator personas on the models' detection performance and fairness across diverse social groups. This work bridges psychological insights on group identity with advanced NLP techniques, demonstrating that incorporating socio-demographic attributes into LLMs can address bias in automated hate speech detection. Our results highlight both the potential and limitations of persona-based approaches in reducing bias, offering valuable insights for developing more equitable hate speech detection systems.
Problem

Research questions and friction points this paper is trying to address.

Personalizing LLMs with annotator personas affects hate speech sensitivity
Analyzing in-group and out-group personas impacts detection performance fairness
Incorporating socio-demographic attributes addresses bias in automated hate speech detection
Innovation

Methods, ideas, or system contributions that make the work stand out.

Persona-LLMs personalize models with annotator personas
RAG enables deep contextualized persona development
Socio-demographic attributes address bias in detection
🔎 Similar Papers
No similar papers found.