🤖 AI Summary
Implicit hate speech detection remains challenging due to the absence of overtly offensive lexical cues and its strong contextual dependency. To address this, we propose a target-hierarchical attention mechanism that emulates human reasoning—first identifying targets, then analyzing their contextual relationships. Our method jointly models explicit targets (extracted via named entity recognition) and implicit targets (encoded by BERT’s [CLS] token), while enhancing target-context interactions through a custom-designed attention relation module. Departing from prevailing contrastive learning paradigms, our approach introduces the first interpretable, hierarchical attention amplification mechanism, enabling fine-grained attribution. Evaluated on multiple benchmarks, it achieves state-of-the-art performance, with average F1 scores 82.14% higher than contrastive learning baselines. It also exhibits faster convergence and produces attention heatmaps highly consistent with human annotations.
📝 Abstract
Implicit hate speech detection is challenging due to its subtlety and reliance on contextual interpretation rather than explicit offensive words. Current approaches rely on contrastive learning, which are shown to be effective on distinguishing hate and non-hate sentences. Humans, however, detect implicit hate speech by first identifying specific targets within the text and subsequently interpreting how these target relate to their surrounding context. Motivated by this reasoning process, we propose AmpleHate, a novel approach designed to mirror human inference for implicit hate detection. AmpleHate identifies explicit target using a pretrained Named Entity Recognition model and capture implicit target information via [CLS] tokens. It computes attention-based relationships between explicit, implicit targets and sentence context and then, directly injects these relational vectors into the final sentence representation. This amplifies the critical signals of target-context relations for determining implicit hate. Experiments demonstrate that AmpleHate achieves state-of-the-art performance, outperforming contrastive learning baselines by an average of 82.14% and achieve faster convergence. Qualitative analyses further reveal that attention patterns produced by AmpleHate closely align with human judgement, underscoring its interpretability and robustness.