From Articles to Premises: Building PrimeFacts, an Extraction Methodology and Resource for Fact-Checking Evidence

📅 2026-05-07
📈 Citations: 0
Influential: 0
📄 PDF

career value

166K/year
🤖 AI Summary
This work addresses the challenge that supporting evidence in fact-checking articles is often presented in unstructured forms, limiting its utility for automated systems. The authors propose PrimeFacts, a novel method that systematically transforms such evidence into structured, context-independent atomic premises. This is achieved by identifying hyperlink anchor texts, leveraging large language models to rewrite sentences so as to eliminate contextual dependencies, and extracting implicit evidence. The approach substantially enhances both evidence retrievability and verification performance: it yields a 30% relative improvement in Mean Reciprocal Rank for cross-article retrieval and boosts Macro-F1 scores by 10–20 percentage points on claim verification tasks. These gains are consistent across varying classification granularities and model architectures, while faithfully preserving the original source information.
📝 Abstract
Fact-checking articles encode rich supporting evidence and reasoning, yet this evidence remains largely inaccessible to automated verification systems due to unstructured presentation. We introduce PrimeFacts, a methodology and resource for extracting fine-grained evidence from full fact-checking articles. We compile 13,106 PolitiFact articles with claims, verdicts, and all referenced sources, and we identify 49,718 in-article hyperlinks as natural anchors to pinpoint key evidence. Our framework leverages large language models (LLMs) to rewrite these anchor sentences into stand-alone, context-independent premises and investigates the extraction of additional implicit evidence. In evaluations on cross-article evidence retrieval and claim verification, the extracted premises substantially improve performance. Decontextualized evidence yields higher retrievability, achieving up to a 30 percent relative gain in Mean Reciprocal Rank over verbatim sentences, and using the evidence for verdict prediction raises Macro-F1 by 10-20 points over the baseline. These gains are consistent across different verdict granularities (2-class vs. 5-class) and model architectures. A qualitative analysis indicates that the decontextualized premises remain faithful to the original sources. Our work highlights the promise of reusing fact-checkers' evidence for automation and provides a large-scale resource of structured evidence from real-world fact-checks.
Problem

Research questions and friction points this paper is trying to address.

fact-checking
evidence extraction
unstructured data
automated verification
decontextualization
Innovation

Methods, ideas, or system contributions that make the work stand out.

evidence extraction
decontextualization
fact-checking
large language models
structured evidence
🔎 Similar Papers
No similar papers found.
P
Premtim Sahitaj
Technische Universität Berlin, Quality and Usability Lab, Berlin, Germany
J
Jawan Kolanowski
Harz University of Applied Sciences, Wernigerode, Germany
A
Ariana Sahitaj
Technische Universität Berlin, Quality and Usability Lab, Berlin, Germany
Veronika Solopova
Veronika Solopova
Technische Universität Berlin
Computational linguisticsEthics of AI
M
Max Upravitelev
Technische Universität Berlin, Quality and Usability Lab, Berlin, Germany
D
Daniel Röder
Deutsches Forschungszentrum für Künstliche Intelligenz (DFKI), Berlin, Germany
I
Iffat Maab
National Institute of Informatics, Tokyo, Japan
Junichi Yamagishi
Junichi Yamagishi
National Institute of Informatics, Tokyo, Japan
Speech processingSpeech synthesisBiometricsDeepfakesMultimedia Forensics
Sebastian Möller
Sebastian Möller
Professor for Quality and Usability, TU Berlin and Scientific Director, DFKI
Quality of ExperienceUser ExperienceSpeechDialogNatural Language Processing
Vera Schmitt
Vera Schmitt
Head of XplaiNLP Research Group at TU Berlin
NLP/LLMsXAIHCIDisinformationUsable Privacy