GRAID: Synthetic Data Generation with Geometric Constraints and Multi-Agentic Reflection for Harmful Content Detection

📅 2025-08-23

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

To address the scarcity of labeled data in harmful text classification, this paper proposes GRAID—a synthetic data generation framework integrating geometrically constrained generation with a multi-agent reflective reasoning mechanism. GRAID enforces geometric constraints in the embedding space to ensure systematic coverage of the input space, while leveraging collaborative multi-agent reasoning to deeply identify and refine edge-case instances, thereby substantially improving the diversity, representativeness, and semantic fidelity of synthetic data. It operates via a two-stage, LLM-driven pipeline: (1) constrained text generation and (2) interactive reflective enhancement. Experiments on two mainstream benchmarks demonstrate that classifiers trained on GRAID-augmented data achieve significantly higher detection accuracy and adversarial robustness compared to state-of-the-art baselines. These results validate GRAID’s effectiveness and generalizability for content safety applications.

Technology Category

Application Category

📝 Abstract

We address the problem of data scarcity in harmful text classification for guardrailing applications and introduce GRAID (Geometric and Reflective AI-Driven Data Augmentation), a novel pipeline that leverages Large Language Models (LLMs) for dataset augmentation. GRAID consists of two stages: (i) generation of geometrically controlled examples using a constrained LLM, and (ii) augmentation through a multi-agentic reflective process that promotes stylistic diversity and uncovers edge cases. This combination enables both reliable coverage of the input space and nuanced exploration of harmful content. Using two benchmark data sets, we demonstrate that augmenting a harmful text classification dataset with GRAID leads to significant improvements in downstream guardrail model performance.

Problem

Research questions and friction points this paper is trying to address.

Addressing data scarcity in harmful text classification

Generating synthetic data with geometric constraints

Enhancing harmful content detection through multi-agentic reflection

Innovation

Methods, ideas, or system contributions that make the work stand out.

Geometrically controlled examples generation

Multi-agentic reflective process augmentation

LLM-based pipeline for harmful content

🔎 Similar Papers

A Survey of Defenses against AI-generated Visual Media: Detection, Disruption, and Authentication