Think Like a Person Before Responding: A Multi-Faceted Evaluation of Persona-Guided LLMs for Countering Hate

📅 2025-06-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses critical gaps in automated counter-narrative (CN) generation against online hate speech—specifically, insufficient emotional appropriateness, accessibility, and ethical robustness. We introduce the first persona-guided, four-dimensional evaluation framework, assessing persona consistency, readability, affective tone, and ethical robustness. Using MT-Conan and HatEval benchmarks, we systematically evaluate GPT-4o-Mini, CommandR-7B, and LLaMA 3.1-70B. Results reveal that current LLMs produce overly verbose, literacy-intensive outputs; while affect-guided prompting improves empathy and readability, it concurrently increases safety risks—uncovering a fundamental trade-off. Our key contribution is a novel, human-centered, multi-dimensional evaluation paradigm grounded in real-user accessibility and socio-emotional alignment. This framework provides both methodological foundations and empirical evidence for developing safe, inclusive, and effective counter-narrative technologies.

Technology Category

Application Category

📝 Abstract
Automated counter-narratives (CN) offer a promising strategy for mitigating online hate speech, yet concerns about their affective tone, accessibility, and ethical risks remain. We propose a framework for evaluating Large Language Model (LLM)-generated CNs across four dimensions: persona framing, verbosity and readability, affective tone, and ethical robustness. Using GPT-4o-Mini, Cohere's CommandR-7B, and Meta's LLaMA 3.1-70B, we assess three prompting strategies on the MT-Conan and HatEval datasets. Our findings reveal that LLM-generated CNs are often verbose and adapted for people with college-level literacy, limiting their accessibility. While emotionally guided prompts yield more empathetic and readable responses, there remain concerns surrounding safety and effectiveness.
Problem

Research questions and friction points this paper is trying to address.

Evaluating persona-guided LLMs for countering hate speech
Assessing affective tone and accessibility of counter-narratives
Addressing ethical risks in LLM-generated hate speech responses
Innovation

Methods, ideas, or system contributions that make the work stand out.

Persona-guided LLM evaluation framework
Multi-dimensional CN assessment strategy
Emotionally guided prompt optimization
🔎 Similar Papers
No similar papers found.