Personalisation or Prejudice? Addressing Geographic Bias in Hate Speech Detection using Debias Tuning in Large Language Models

📅 2025-05-04

📈 Citations: 0

✨ Influential: 0

career value

204K/year

🤖 AI Summary

This work identifies geographically induced bias in large language model (LLM)-based hate speech detection, arising from personalized contextual cues such as user nationality and language. To mitigate this bias, we propose *Debias Tuning*: a fine-tuning framework that constructs geo-contextual prompt pairs using multilingual persona templates and enforces decision consistency across contextualized and decontextualized inputs via a contrastive learning–driven consistency loss. Our method is the first to jointly improve cross-context fairness and generalization. Evaluated on multilingual hate speech benchmarks, Debias Tuning reduces geographic decision inconsistency by 38.2% on average while improving F1-score by 2.1% in zero-context settings—demonstrating both efficacy and practical utility.

Technology Category

Application Category

📝 Abstract

Commercial Large Language Models (LLMs) have recently incorporated memory features to deliver personalised responses. This memory retains details such as user demographics and individual characteristics, allowing LLMs to adjust their behaviour based on personal information. However, the impact of integrating personalised information into the context has not been thoroughly assessed, leading to questions about its influence on LLM behaviour. Personalisation can be challenging, particularly with sensitive topics. In this paper, we examine various state-of-the-art LLMs to understand their behaviour in different personalisation scenarios, specifically focusing on hate speech. We prompt the models to assume country-specific personas and use different languages for hate speech detection. Our findings reveal that context personalisation significantly influences LLMs' responses in this sensitive area. To mitigate these unwanted biases, we fine-tune the LLMs by penalising inconsistent hate speech classifications made with and without country or language-specific context. The refined models demonstrate improved performance in both personalised contexts and when no context is provided.

Problem

Research questions and friction points this paper is trying to address.

Addressing geographic bias in hate speech detection

Assessing impact of personalization on LLM behavior

Mitigating biases via debias tuning in LLMs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Debias tuning to mitigate geographic bias

Fine-tuning LLMs with penalty for inconsistency

Country-specific personas for hate speech detection

🔎 Similar Papers

From Prejudice to Parity: A New Approach to Debiasing Large Language Model Word Embeddings