🤖 AI Summary
Current NLP research on counter-hate speech is critically misaligned with the needs of communities most affected by online harm, evidenced by outdated datasets and the exclusion of key stakeholders—particularly frontline NGOs and impacted communities—from model development and evaluation. Method: We conduct a systematic review of 74 NLP studies to document this misalignment and collaborate with five NGOs specializing in online gender-based violence through a participatory case study involving surveys, qualitative interviews, and co-design workshops. Contribution/Results: We introduce the first “stakeholder-driven” research framework for counter-speech, comprising novel practice guidelines spanning data curation, model design, and ethical assessment. Further, we propose concrete technical pathways and governance recommendations to recenter community agency, thereby advancing socially responsive AI.
📝 Abstract
Counterspeech, i.e. the practice of responding to online hate speech, has gained traction in NLP as a promising intervention. While early work emphasised collaboration with non-governmental organisation stakeholders, recent research trends have shifted toward automated pipelines that reuse a small set of legacy datasets, often without input from affected communities. This paper presents a systematic review of 74 NLP studies on counterspeech, analysing the extent to which stakeholder participation influences dataset creation, model development, and evaluation. To complement this analysis, we conducted a participatory case study with five NGOs specialising in online Gender-Based Violence (oGBV), identifying stakeholder-informed practices for counterspeech generation. Our findings reveal a growing disconnect between current NLP research and the needs of communities most impacted by toxic online content. We conclude with concrete recommendations for re-centring stakeholder expertise in counterspeech research.