🤖 AI Summary
This work proposes the first system that applies Retrieval-Augmented Generation (RAG) to the explanation and counter-response generation for online hate speech. Addressing the growing prevalence of both explicit and implicit hate speech on digital platforms, the approach integrates factual evidence retrieved from external sources to enable fine-grained, interpretable analysis and generate well-grounded, reliable rebuttals. By anchoring responses in verifiable information, the method significantly enhances the persuasiveness and accuracy of generated counterarguments. Furthermore, it systematically uncovers key characteristics of effective rebuttal discourse, offering a novel paradigm for developing responsible AI-powered conversational systems capable of mitigating harmful content while maintaining transparency and accountability.
📝 Abstract
The increasing volume of hate speech on online platforms poses significant societal challenges. While the Natural Language Processing community has developed effective methods to automatically detect the presence of hate speech, responses to it, called counter-speech, are still an open challenge. We present PEACE 2.0, a novel tool that, besides analysing and explaining why a message is considered hateful or not, also generates a response to it. More specifically, PEACE 2.0 has three main new functionalities: leveraging a Retrieval-Augmented Generation (RAG) pipeline i) to ground HS explanations into evidence and facts, ii) to automatically generate evidence-grounded counter-speech, and iii) exploring the characteristics of counter-speech replies. By integrating these capabilities, PEACE 2.0 enables in-depth analysis and response generation for both explicit and implicit hateful messages.