đ¤ AI Summary
This study addresses the challenge of evaluating the effectiveness of counter-speech in mitigating online hate speech. We propose the first sociologically grounded, six-dimensional evaluation frameworkâencompassing clarity, evidentiality, emotional appeal, and other theoretically motivated dimensionsâand conduct collaborative human annotation of 4,214 counter-speech instances, resulting in the first large-scale, structured counter-speech dataset, publicly released. Methodologically, we introduce a novel multi-task learning architecture with dependency-aware classification to explicitly model inter-dimensional relationships. Our model achieves average F1 scores of 0.94 and 0.96 on development and test sets, respectivelyâsignificantly outperforming strong baselinesâand demonstrates robust generalization across both expert-crafted and user-generated counter-speech. This work advances counter-speech evaluation from binary classification toward an interpretable, sociologically informed, and structurally rich paradigm.
đ Abstract
Counter-speech (CS) is a key strategy for mitigating online Hate Speech (HS), yet defining the criteria to assess its effectiveness remains an open challenge. We propose a novel computational framework for CS effectiveness classification, grounded in social science concepts. Our framework defines six core dimensions - Clarity, Evidence, Emotional Appeal, Rebuttal, Audience Adaptation, and Fairness - which we use to annotate 4,214 CS instances from two benchmark datasets, resulting in a novel linguistic resource released to the community. In addition, we propose two classification strategies, multi-task and dependency-based, achieving strong results (0.94 and 0.96 average F1 respectively on both expert- and user-written CS), outperforming standard baselines, and revealing strong interdependence among dimensions.