Anecdoctoring: Automated Red-Teaming Across Language and Place

📅 2025-09-23

📈 Citations: 1

✨ Influential: 0

career value

189K/year

🤖 AI Summary

Generative AI is vulnerable to misuse in cross-lingual disinformation dissemination; however, existing red-teaming evaluation datasets are heavily skewed toward English and U.S.-centric contexts, lacking multilingual and cross-cultural robustness. To address this gap, we propose AnecDoctoring—the first automated adversarial testing framework driven by knowledge graphs constructed from multilingual fact-checking data. It synergistically integrates clustering analysis, graph-based reasoning, and large language models to generate culturally grounded, cross-lingual adversarial prompts. Our method significantly enhances the interpretability, generalizability, and practicality of red-teaming attacks, achieving higher attack success rates across English, Spanish, and Hindi. Empirical validation confirms the necessity and feasibility of global, multilingual red-teaming evaluation, offering a scalable technical pathway for defending generative AI against cross-lingual disinformation.

Technology Category

Application Category

📝 Abstract

Disinformation is among the top risks of generative artificial intelligence (AI) misuse. Global adoption of generative AI necessitates red-teaming evaluations (i.e., systematic adversarial probing) that are robust across diverse languages and cultures, but red-teaming datasets are commonly US- and English-centric. To address this gap, we propose"anecdoctoring", a novel red-teaming approach that automatically generates adversarial prompts across languages and cultures. We collect misinformation claims from fact-checking websites in three languages (English, Spanish, and Hindi) and two geographies (US and India). We then cluster individual claims into broader narratives and characterize the resulting clusters with knowledge graphs, with which we augment an attacker LLM. Our method produces higher attack success rates and offers interpretability benefits relative to few-shot prompting. Results underscore the need for disinformation mitigations that scale globally and are grounded in real-world adversarial misuse.

Problem

Research questions and friction points this paper is trying to address.

Addressing the lack of multilingual and multicultural red-teaming evaluations for AI safety

Developing automated methods to generate adversarial prompts across diverse languages and cultures

Creating scalable disinformation mitigations grounded in real-world adversarial misuse patterns

Innovation

Methods, ideas, or system contributions that make the work stand out.

Automated multilingual adversarial prompt generation

Clustering misinformation claims into narrative knowledge graphs

Augmenting attacker LLM with structured cultural narratives

🔎 Similar Papers

No similar papers found.