🤖 AI Summary
Existing synthetic data often lacks fine-grained sociotechnical variation in sensitive domains, limiting its effectiveness in evaluating the safety and robustness of large language models (LLMs). This work proposes NodeSynth, a novel method that integrates a fine-grained taxonomy grounded in real-world evidence into the synthetic data generation pipeline. By leveraging a tuned taxonomy-aware generator (TaG), NodeSynth produces socially relevant, high-risk queries that substantially enhance stress-testing capabilities for mainstream LLMs and their safety mechanisms. Empirical evaluation demonstrates that NodeSynth induces failure rates up to five times higher than human baselines across four prominent LLMs, exposing critical vulnerabilities in widely deployed safeguards such as Llama-Guard-3. The complete prototype and dataset are publicly released to support further research.
📝 Abstract
Recent advancements in generative AI facilitate large-scale synthetic data generation for model evaluation. However, without targeted approaches, these datasets often lack the sociotechnical nuance required for sensitive domains. We introduce NodeSynth, an evidence-grounded methodology that generates socially relevant synthetic queries by leveraging a fine-tuned taxonomy generator (TaG) anchored in real-world evidence. Evaluated against four mainstream LLMs (e.g., Claude 4.5 Haiku), NodeSynth elicited failure rates up to five times higher than human-authored benchmarks. Ablation studies confirm that our granular taxonomic expansion significantly drives these failure rates, while independent validation reveals critical deficiencies in prominent guard models (e.g., Llama-Guard-3). We open-source our end-to-end research prototype and datasets to enable scalable, high-stakes model evaluation and targeted safety interventions (https://github.com/google-research/nodesynth).