DS@GT at CheckThat! 2025: Evaluating Context and Tokenization Strategies for Numerical Fact Verification

📅 2025-07-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the challenge of inaccurate truth assessment in automated verification of numeric factual claims—those involving quantities, comparisons, and temporal expressions. To tackle this, we propose a verification framework integrating evidence retrieval with natural language inference (NLI). Through systematic evaluation, we analyze the impact of context length and tokenization strategies (e.g., right-to-left, R2L) on numeric reasoning, finding that neither extended context nor R2L tokenization improves performance—indicating that evidence quality, rather than model architecture, constitutes the primary bottleneck. Methodologically, we design a lightweight, QuanTemp-adapted evidence retrieval pipeline and combine it with ModernBERT and an NLI classifier for end-to-end verification. Our system achieves a macro-averaged F1-score of 0.57 on CheckThat! 2025 Task 3, ranking among the top four submissions. The code is publicly released.

Technology Category

Application Category

📝 Abstract
Numerical claims, statements involving quantities, comparisons, and temporal references, pose unique challenges for automated fact-checking systems. In this study, we evaluate modeling strategies for veracity prediction of such claims using the QuanTemp dataset and building our own evidence retrieval pipeline. We investigate three key factors: (1) the impact of more evidences with longer input context windows using ModernBERT, (2) the effect of right-to-left (R2L) tokenization, and (3) their combined influence on classification performance. Contrary to prior findings in arithmetic reasoning tasks, R2L tokenization does not boost natural language inference (NLI) of numerical tasks. A longer context window does also not enhance veracity performance either, highlighting evidence quality as the dominant bottleneck. Our best-performing system achieves competitive macro-average F1 score of 0.57 and places us among the Top-4 submissions in Task 3 of CheckThat! 2025. Our code is available at https://github.com/dsgt-arc/checkthat-2025-numerical.
Problem

Research questions and friction points this paper is trying to address.

Evaluating context and tokenization for numerical fact verification
Assessing impact of evidence quantity and context window size
Investigating right-to-left tokenization effects on numerical claims
Innovation

Methods, ideas, or system contributions that make the work stand out.

Evaluated ModernBERT with longer context windows
Tested right-to-left tokenization for NLI
Combined context and tokenization for classification
🔎 Similar Papers
No similar papers found.
M
Maximilian Heil
Georgia Institute of Technology, North Ave NW, Atlanta, GA 30332
Aleksandar Pramov
Aleksandar Pramov
Unknown affiliation