🤖 AI Summary
To address the persistent evolution of hate speech and the detection lag inherent in traditional lexicon-based methods due to static vocabulary lists, this paper proposes an adaptive online detection and mitigation framework. Methodologically, it introduces (1) a word-embedding-driven dynamic lexicon evolution mechanism that enables real-time identification and incremental updating of novel abusive terms and orthographic variants; and (2) a hybrid model integrating BERT-based semantic modeling with dynamic lexicon matching, augmented with context-aware spelling correction and semantic similarity computation. Evaluated on mainstream benchmark datasets, the framework achieves 95% accuracy, significantly improving both detection rate and timeliness for emerging hate speech. It robustly handles challenges including target-group migration and linguistic variation, thereby enhancing adaptability in rapidly shifting sociolinguistic contexts.
📝 Abstract
The proliferation of social media platforms has led to an increase in the spread of hate speech, particularly targeting vulnerable communities. Unfortunately, existing methods for automatically identifying and blocking toxic language rely on pre-constructed lexicons, making them reactive rather than adaptive. As such, these approaches become less effective over time, especially when new communities are targeted with slurs not included in the original datasets. To address this issue, we present an adaptive approach that uses word embeddings to update lexicons and develop a hybrid model that adjusts to emerging slurs and new linguistic patterns. This approach can effectively detect toxic language, including intentional spelling mistakes employed by aggressors to avoid detection. Our hybrid model, which combines BERT with lexicon-based techniques, achieves an accuracy of 95% for most state-of-the-art datasets. Our work has significant implications for creating safer online environments by improving the detection of toxic content and proactively updating the lexicon. Content Warning: This paper contains examples of hate speech that may be triggering.