ContiGuard: A Framework for Continual Toxicity Detection Against Evolving Evasive Perturbations

📅 2026-03-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes the first continual toxicity detection framework designed to address the challenge posed by malicious users’ ever-evolving evasion perturbations, which current methods struggle to handle due to their limited adaptability and lack of sustained learning capabilities. By integrating large language model (LLM)-driven semantic augmentation with a discriminative feature learning mechanism, and incorporating continual learning techniques, the framework enables dynamic adaptation to temporally evolving perturbed texts. This approach significantly enhances the detector’s accuracy, generalization, and stability when confronted with novel evasion strategies, thereby achieving robust and sustained identification of toxic content over time.

Technology Category

Application Category

📝 Abstract
Toxicity detection mitigates the dissemination of toxic content (e.g., hateful comments, posts, and messages within online social actions) to safeguard a healthy online social environment. However, malicious users persistently develop evasive perturbations to disguise toxic content and evade detectors. Traditional detectors or methods are static over time and are inadequate in addressing these evolving evasion tactics. Thus, continual learning emerges as a logical approach to dynamically update detection ability against evolving perturbations. Nevertheless, disparities across perturbations hinder the detector's continual learning on perturbed text. More importantly, perturbation-induced noises distort semantics to degrade comprehension and also impair critical feature learning to render detection sensitive to perturbations. These amplify the challenge of continual learning against evolving perturbations. In this work, we present ContiGuard, the first framework tailored for continual learning of the detector on time-evolving perturbed text (termed continual toxicity detection) to enable the detector to continually update capability and maintain sustained resilience against evolving perturbations. Specifically, to boost the comprehension, we present an LLM-powered semantic enriching strategy, where we dynamically incorporate possible meaning and toxicity-related clues excavated by LLM into the perturbed text to improve the comprehension. To mitigate non-critical features and amplify critical ones, we propose a discriminability-driven feature learning strategy, where we strengthen discriminative features while suppressing the less-discriminative ones to shape a robust classification boundary for detection...
Problem

Research questions and friction points this paper is trying to address.

toxicity detection
evasive perturbations
continual learning
semantic distortion
feature learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

continual learning
toxicity detection
evasive perturbations
LLM-powered semantic enrichment
discriminability-driven feature learning
🔎 Similar Papers
No similar papers found.
H
Hankun Kang
School of Computer Science, Wuhan University, Wuhan, Hubei, China
X
Xin Miao
School of Computer Science, Wuhan University, Wuhan, Hubei, China
J
Jianhao Chen
School of Computer Science, Wuhan University, Wuhan, Hubei, China; Zhongguancun Academy, Beijing, China
J
Jintao Wen
School of Computer Science, Wuhan University, Wuhan, Hubei, China
Mayi Xu
Mayi Xu
Wuhan University
Natural Language Processing
Weiyu Zhang
Weiyu Zhang
Qilu University of Technology (Shandong Academy of Sciences)
Graph Data Mining、Machine Learning、NLP、LLM
W
Wenpeng Lu
Key Laboratory of Computing Power Network and Information Security, Ministry of Education; Shandong Computer Science Center (National Supercomputer Center in Jinan); Qilu University of Technology (Shandong Academy of Sciences), Jinan, Shandong, China; Shandong Provincial Key Laboratory of Computing Power Internet and Service Computing; Shandong Fundamental Research Center for Computer Science, Jinan, Shandong, China
Tieyun Qian
Tieyun Qian
Wuhan University
natural language processingweb data mining