"Lost-in-the-Later": Framework for Quantifying Contextual Grounding in Large Language Models

📅 2025-07-07

📈 Citations: 0

✨ Influential: 0

career value

137K/year

🤖 AI Summary

This paper identifies a pervasive “lost-in-the-later” phenomenon in large language models (LLMs) during open-ended question answering: LLMs systematically underutilize information positioned later in long contexts, leading to factual inconsistency and hallucination—exacerbated notably by reasoning models and chain-of-thought prompting. To address this, we propose CoPE, a comprehensive evaluation framework, and introduce MultiWikiAtomic, a multilingual benchmark dataset designed to systematically quantify contextual knowledge utilization. We provide the first empirical evidence and quantitative measurement of this positional bias. Furthermore, we propose a context-knowledge-aware (CK) prompting method that significantly improves factual accuracy and reduces hallucination in summarization tasks. Our work establishes a novel paradigm for synergistic modeling of contextual and parametric knowledge and delivers a reproducible, rigorous evaluation benchmark for future research.

Technology Category

Application Category

📝 Abstract

Large language models are capable of leveraging both contextual and parametric knowledge but how they prioritize and integrate these sources remains underexplored. We introduce CoPE, a novel evaluation framework that systematically measures contextual knowledge (CK) and parametric knowledge (PK) across models and languages. Using our MultiWikiAtomic dataset in English, Spanish, and Danish, we analyze how large language models (LLMs) integrate context, prioritize information, and incorporate PK in open-ended question answering. Our analysis uncovers a phenomenon we call lost-in-the-later, where LLMs tend to overlook or deprioritize information that appears later in a given context, revealing a strong positional bias that affects contextual grounding. We further find that reasoning models, as well as non-reasoning models prompted with chain-of-thought (CoT), use context even less than non-reasoning models without CoT and fail to mitigate the lost-in-the-later effect. CoT prompting, in particular, results in lower recall and shorter responses, leading to degraded contextual grounding. Based on these insights, we design prompt-based methods to effectively leverage input context. A case study applying CoPE to summarization demonstrates that CK-informed prompting improves factual grounding and reduces hallucination.

Problem

Research questions and friction points this paper is trying to address.

Measures contextual and parametric knowledge in LLMs

Analyzes positional bias in LLM contextual grounding

Evaluates CoT prompting impact on context usage

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces CoPE framework for evaluating contextual knowledge

Analyzes positional bias in LLMs with MultiWikiAtomic dataset

Designs prompt methods to improve contextual grounding

🔎 Similar Papers

Racing Thoughts: Explaining Large Language Model Contextualization Errors