Context Shapes LLMs Retrieval-Augmented Fact-Checking Effectiveness

📅 2026-02-15

📈 Citations: 0

✨ Influential: 0

career value

182K/year

🤖 AI Summary

This study investigates the instability of large language models (LLMs) in long-context fact verification, revealing significant sensitivity to both evidence position and context length. Through a systematic evaluation of open-source models—including Llama-3.1, Qwen2.5, and Qwen3—on the HOVER, FEVEROUS, and ClimateFEVER benchmarks, this work is the first to demonstrate, across multiple models and datasets, the critical influence of contextual structure on retrieval-augmented fact-checking. The findings indicate that while these models possess some parametric factual knowledge, their verification accuracy consistently degrades as context length increases. Moreover, performance is markedly higher when relevant evidence appears at the beginning or end of the context compared to the middle, confirming a robust position bias effect in fact verification tasks.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) show strong reasoning abilities across diverse tasks, yet their performance on extended contexts remains inconsistent. While prior research has emphasized mid-context degradation in question answering, this study examines the impact of context in LLM-based fact verification. Using three datasets (HOVER, FEVEROUS, and ClimateFEVER) and five open-source models accross different parameters sizes (7B, 32B and 70B parameters) and model families (Llama-3.1, Qwen2.5 and Qwen3), we evaluate both parametric factual knowledge and the impact of evidence placement across varying context lengths. We find that LLMs exhibit non-trivial parametric knowledge of factual claims and that their verification accuracy generally declines as context length increases. Similarly to what has been shown in previous works, in-context evidence placement plays a critical role with accuracy being consistently higher when relevant evidence appears near the beginning or end of the prompt and lower when placed mid-context. These results underscore the importance of prompt structure in retrieval-augmented fact-checking systems.

Problem

Research questions and friction points this paper is trying to address.

retrieval-augmented fact-checking

context length

evidence placement

large language models

fact verification

Innovation

Methods, ideas, or system contributions that make the work stand out.

retrieval-augmented fact-checking

context length effect

evidence placement