BLUFF: Benchmarking the Detection of False and Synthetic Content across 58 Low-Resource Languages

📅 2026-02-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the critical gap in detecting deceptive and synthetic content for low-resource languages, an area largely overlooked by existing research focused on high-resource settings. The authors introduce a multilingual benchmark dataset spanning 79 languages with 202,000 samples, systematically integrating both human-written and large language model–generated text across the resource spectrum. They propose AXL-CoI, a multi-agent framework that leverages 19 large language models and 39 textual perturbation strategies—including 36 disinformation manipulation techniques and 3 AI-based editing methods—coupled with the mPURIFY quality filtering pipeline to ensure data reliability. Experiments reveal that current detectors suffer performance degradation of up to 25.3% in F1 score on low-resource languages. The work releases the dataset, evaluation toolkit, and documentation to advance equitable multilingual research in this domain.

Technology Category

Application Category

📝 Abstract
Multilingual falsehoods threaten information integrity worldwide, yet detection benchmarks remain confined to English or a few high-resource languages, leaving low-resource linguistic communities without robust defense tools. We introduce BLUFF, a comprehensive benchmark for detecting false and synthetic content, spanning 79 languages with over 202K samples, combining human-written fact-checked content (122K+ samples across 57 languages) and LLM-generated content (79K+ samples across 71 languages). BLUFF uniquely covers both high-resource "big-head" (20) and low-resource "long-tail" (59) languages, addressing critical gaps in multilingual research on detecting false and synthetic content. Our dataset features four content types (human-written, LLM-generated, LLM-translated, and hybrid human-LLM text), bidirectional translation (English$\leftrightarrow$X), 39 textual modification techniques (36 manipulation tactics for fake news, 3 AI-editing strategies for real news), and varying edit intensities generated using 19 diverse LLMs. We present AXL-CoI (Adversarial Cross-Lingual Agentic Chainof-Interactions), a novel multi-agentic framework for controlled fake/real news generation, paired with mPURIFY, a quality filtering pipeline ensuring dataset integrity. Experiments reveal state-of-theart detectors suffer up to 25.3% F1 degradation on low-resource versus high-resource languages. BLUFF provides the research community with a multilingual benchmark, extensive linguistic-oriented benchmark evaluation, comprehensive documentation, and opensource tools to advance equitable falsehood detection. Dataset and code are available at: https://jsl5710.github.io/BLUFF/
Problem

Research questions and friction points this paper is trying to address.

false content detection
synthetic content
low-resource languages
multilingual benchmark
information integrity
Innovation

Methods, ideas, or system contributions that make the work stand out.

multilingual benchmark
low-resource languages
synthetic content detection
multi-agent generation
adversarial cross-lingual framework
🔎 Similar Papers
No similar papers found.