🤖 AI Summary
Real-world fact-checking critically requires retrieval of indirect network evidence—evidence that supports or refutes complex claims through non-direct, multi-hop reasoning—but existing methods underperform on bidirectional (supporting/contradicting) and indirect evidence retrieval, and lack zero-shot, open-domain evaluation benchmarks grounded in authentic production logs. To address this, we introduce FactIR: the first zero-shot, open-domain, bidirectional indirect evidence retrieval benchmark for fact-checking, constructed from real-world verification logs in Factiverse and enriched with multi-dimensional human annotations. We systematically evaluate state-of-the-art dense and sparse retrieval models on FactIR, revealing substantial deficiencies in retrieving contradictory and indirect evidence. FactIR fills a critical gap in open-domain fact-checking retrieval evaluation, providing a reproducible benchmark and identifying key directions for model improvement.
📝 Abstract
The field of automated fact-checking increasingly depends on retrieving web-based evidence to determine the veracity of claims in real-world scenarios. A significant challenge in this process is not only retrieving relevant information, but also identifying evidence that can both support and refute complex claims. Traditional retrieval methods may return documents that directly address claims or lean toward supporting them, but often struggle with more complex claims requiring indirect reasoning. While some existing benchmarks and methods target retrieval for fact-checking, a comprehensive real-world open-domain benchmark has been lacking. In this paper, we present a real-world retrieval benchmark FactIR, derived from Factiverse production logs, enhanced with human annotations. We rigorously evaluate state-of-the-art retrieval models in a zero-shot setup on FactIR and offer insights for developing practical retrieval systems for fact-checking. Code and data are available at https://github.com/factiverse/factIR.