🤖 AI Summary
Existing automated evaluation methods struggle to effectively assess the quality of anti-hate speech generation responses, particularly across multidimensional criteria—including relevance, mildness, logical coherence, and appropriateness—due to the lack of reference-free, high-agreement solutions. This paper introduces the first reference-free, multidimensional automated evaluation framework specifically designed for anti-hate speech generation, assessing four core dimensions: contextual relevance, offensiveness, argumentative coherence, and appropriateness. We propose Auto-Calibrated Chain-of-Thought (ACE), a novel large language model–based self-calibration method that enables fine-grained scoring via chain-of-thought reasoning. Compared to conventional similarity-based metrics (ROUGE, METEOR, BERTScore), ACE achieves an average 37% improvement in Pearson correlation with human judgments across all four dimensions, significantly enhancing inter-annotator agreement and overcoming the fundamental limitation of reference-dependent evaluation.
📝 Abstract
Counterspeech has been popular as an effective approach to counter online hate speech, leading to increasing research interest in automated counterspeech generation using language models. However, this field lacks standardised evaluation protocols and robust automated evaluation metrics that align with human judgement. Current automatic evaluation methods, primarily based on similarity metrics, do not effectively capture the complex and independent attributes of counterspeech quality, such as contextual relevance, aggressiveness, or argumentative coherence. This has led to an increased dependency on labor-intensive human evaluations to assess automated counter-speech generation methods. To address these challenges, we introduce CSEval, a novel dataset and framework for evaluating counterspeech quality across four dimensions: contextual-relevance, aggressiveness, argument-coherence, and suitableness. Furthermore, we propose Auto-Calibrated COT for Counterspeech Evaluation (ACE), a prompt-based method with auto-calibrated chain-of-thoughts (CoT) for scoring counterspeech using large language models. Our experiments show that ACE outperforms traditional metrics like ROUGE, METEOR, and BertScore in correlating with human judgement, indicating a significant advancement in automated counterspeech evaluation.