ToxiCraft: A Novel Framework for Synthetic Generation of Harmful Information

📅 2024-09-23

🏛️ Conference on Empirical Methods in Natural Language Processing

📈 Citations: 4

✨ Influential: 0

career value

201K/year

🤖 AI Summary

To address data scarcity and inconsistent labeling criteria for harmful content detection in low-resource settings, this paper proposes ToxiCraft—a framework that generates high-fidelity, diverse toxic texts from minimal seed data. Methodologically, ToxiCraft introduces a novel synthesis paradigm integrating semantic-controllable perturbation with toxicity-aligned distillation, combining prompt-driven generation, adversarial toxicity enhancement, consistency-based filtering, and lightweight discriminator-guided refinement. This design significantly improves model robustness against spurious features and cross-domain generalization. Experiments across multiple benchmarks demonstrate substantial gains in detection accuracy and robustness; generated samples achieve performance on par with human-annotated data, effectively reducing reliance on large-scale manual annotation.

Technology Category

Application Category

📝 Abstract

In different NLP tasks, detecting harmful content is crucial for online environments, especially with the growing influence of social media. However, previous research has two main issues: 1) a lack of data in low-resource settings, and 2) inconsistent definitions and criteria for judging harmful content, requiring classification models to be robust to spurious features and diverse. We propose Toxicraft, a novel framework for synthesizing datasets of harmful information to address these weaknesses. With only a small amount of seed data, our framework can generate a wide variety of synthetic, yet remarkably realistic, examples of toxic information. Experimentation across various datasets showcases a notable enhancement in detection model robustness and adaptability, surpassing or close to the gold labels.

Problem

Research questions and friction points this paper is trying to address.

Lack of data in low-resource harmful content detection

Inconsistent definitions for judging harmful information

Need robust models for diverse toxic content classification

Innovation

Methods, ideas, or system contributions that make the work stand out.

Generates synthetic harmful content datasets

Requires minimal seed data input

Enhances detection model robustness significantly

🔎 Similar Papers

Contrastive Perplexity for Controlled Generation: An Application in Detoxifying Large Language Models