ToxiCraft: A Novel Framework for Synthetic Generation of Harmful Information

πŸ“… 2024-09-23
πŸ›οΈ Conference on Empirical Methods in Natural Language Processing
πŸ“ˆ Citations: 4
✨ Influential: 0
πŸ“„ PDF

career value

213K/year
πŸ€– AI Summary
To address data scarcity and inconsistent labeling criteria for harmful content detection in low-resource settings, this paper proposes ToxiCraftβ€”a framework that generates high-fidelity, diverse toxic texts from minimal seed data. Methodologically, ToxiCraft introduces a novel synthesis paradigm integrating semantic-controllable perturbation with toxicity-aligned distillation, combining prompt-driven generation, adversarial toxicity enhancement, consistency-based filtering, and lightweight discriminator-guided refinement. This design significantly improves model robustness against spurious features and cross-domain generalization. Experiments across multiple benchmarks demonstrate substantial gains in detection accuracy and robustness; generated samples achieve performance on par with human-annotated data, effectively reducing reliance on large-scale manual annotation.

Technology Category

Application Category

πŸ“ Abstract
In different NLP tasks, detecting harmful content is crucial for online environments, especially with the growing influence of social media. However, previous research has two main issues: 1) a lack of data in low-resource settings, and 2) inconsistent definitions and criteria for judging harmful content, requiring classification models to be robust to spurious features and diverse. We propose Toxicraft, a novel framework for synthesizing datasets of harmful information to address these weaknesses. With only a small amount of seed data, our framework can generate a wide variety of synthetic, yet remarkably realistic, examples of toxic information. Experimentation across various datasets showcases a notable enhancement in detection model robustness and adaptability, surpassing or close to the gold labels.
Problem

Research questions and friction points this paper is trying to address.

Lack of data in low-resource harmful content detection
Inconsistent definitions for judging harmful information
Need robust models for diverse toxic content classification
Innovation

Methods, ideas, or system contributions that make the work stand out.

Generates synthetic harmful content datasets
Requires minimal seed data input
Enhances detection model robustness significantly