NodeSynth: Socially Aligned Synthetic Data for AI Evaluation

📅 2026-05-14

📈 Citations: 0

✨ Influential: 0

career value

203K/year

🤖 AI Summary

Existing synthetic data often lacks fine-grained sociotechnical variation in sensitive domains, limiting its effectiveness in evaluating the safety and robustness of large language models (LLMs). This work proposes NodeSynth, a novel method that integrates a fine-grained taxonomy grounded in real-world evidence into the synthetic data generation pipeline. By leveraging a tuned taxonomy-aware generator (TaG), NodeSynth produces socially relevant, high-risk queries that substantially enhance stress-testing capabilities for mainstream LLMs and their safety mechanisms. Empirical evaluation demonstrates that NodeSynth induces failure rates up to five times higher than human baselines across four prominent LLMs, exposing critical vulnerabilities in widely deployed safeguards such as Llama-Guard-3. The complete prototype and dataset are publicly released to support further research.

📝 Abstract

Recent advancements in generative AI facilitate large-scale synthetic data generation for model evaluation. However, without targeted approaches, these datasets often lack the sociotechnical nuance required for sensitive domains. We introduce NodeSynth, an evidence-grounded methodology that generates socially relevant synthetic queries by leveraging a fine-tuned taxonomy generator (TaG) anchored in real-world evidence. Evaluated against four mainstream LLMs (e.g., Claude 4.5 Haiku), NodeSynth elicited failure rates up to five times higher than human-authored benchmarks. Ablation studies confirm that our granular taxonomic expansion significantly drives these failure rates, while independent validation reveals critical deficiencies in prominent guard models (e.g., Llama-Guard-3). We open-source our end-to-end research prototype and datasets to enable scalable, high-stakes model evaluation and targeted safety interventions (https://github.com/google-research/nodesynth).

Problem

Research questions and friction points this paper is trying to address.

synthetic data

AI evaluation

sociotechnical nuance

model safety

social alignment

Innovation

Methods, ideas, or system contributions that make the work stand out.

synthetic data generation

sociotechnical evaluation

taxonomy-guided generation