CommunityFact: A Dynamic, Multilingual, Multi-domain Benchmark for Misinformation Detection in the Wild

📅 2026-05-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing static benchmarks inadequately evaluate models’ ability to detect misinformation in real-world, dynamic, and multilingual online environments. This work introduces a dynamically updatable, multilingual, and multidomain benchmark for misinformation detection, covering five languages and two domains with 15,992 fine-grained annotated claims, and incorporates Community Notes as a novel training and evaluation signal for the first time. By integrating large language models’ reasoning capabilities with web search, the proposed approach employs retrieval expansion and pruning mechanisms to optimize source selection, substantially improving verification performance. Experiments demonstrate that web access is critical for closed-input verification, reveal significant performance disparities across language–domain combinations, and show that the method effectively reduces the systematic bias between model and human source selection.
📝 Abstract
Misinformation verification increasingly occurs in public, fast-moving, and multilingual online settings, where static benchmarks provide an incomplete measure of model reliability. We introduce CommunityFact, a refreshable benchmark for misinformation detection in the wild, with three major goals: coverage, granularity, and redistributability. This release contains 15,992 standalone claims across five languages and two domains. We evaluate ten LLMs under varying inference-time capabilities, including thinking and web-search. Our results show that closed-input verification remains challenging, web access yields the largest gains, and web-enabled LLMs' source-selection policies are systematically misaligned with the sources human Community Notes raters converge on -- a gap that closes through model-specific mechanisms of retrieval expansion or pruning. We further find substantial variation across language-domain slices and across the evidence ecosystems used by web-enabled systems. Beyond evaluation, CommunityFact positions Community Notes as a training signal for claim-conditioned source suggesters that could improve factual verification on novel claims.
Problem

Research questions and friction points this paper is trying to address.

misinformation detection
dynamic benchmark
multilingual
multi-domain
model reliability
Innovation

Methods, ideas, or system contributions that make the work stand out.

dynamic benchmark
multilingual misinformation detection
web-enabled LLMs
source selection alignment
Community Notes
🔎 Similar Papers
No similar papers found.