When Does Context Help? Error Dynamics of Contextual Information in Large Language Models

📅 2026-02-09

📈 Citations: 0

✨ Influential: 0

career value

161K/year

🤖 AI Summary

This work addresses the lack of a unified theoretical understanding of how in-context information influences output errors in large language models during inference. We propose the first additive error decomposition mechanism, establishing a unified theoretical framework that analyzes error dynamics within Transformers to reveal the geometric relationship between context-induced corrections and baseline errors. From this analysis, we derive necessary conditions for error reduction and an explicit upper bound on the norm of context corrections. Leveraging error dynamics in both single- and multi-layer Transformers, along with modeling of context-query correlations and geometric constraints, we validate our theory across in-context learning, retrieval-augmented generation, and memory evolution tasks. Guided by these insights, our context selection strategy yields a 0.6% performance improvement.

Technology Category

Application Category

📝 Abstract

Contextual information at inference time, such as demonstrations, retrieved knowledge, or interaction history, can substantially improve large language models (LLMs) without parameter updates, yet its theoretical role remains poorly understood beyond specific settings such as in-context learning (ICL). We present a unified theoretical framework for analyzing the effect of arbitrary contextual information in Transformer-based LLMs. Our analysis characterizes contextual influence through output error dynamics. In a single-layer Transformer, we prove that the context-conditioned error vector decomposes additively into the baseline error vector and a contextual correction vector. This yields necessary geometric conditions for error reduction: the contextual correction must align with the negative baseline error and satisfy a norm constraint. We further show that the contextual correction norm admits an explicit upper bound determined by context-query relevance and complementarity. These results extend to multi-context and multi-layer Transformers. Experiments across ICL, retrieval-augmented generation, and memory evolution validate our theory and motivate a principled context selection strategy that improves performance by $0.6\%$.

Problem

Research questions and friction points this paper is trying to address.

contextual information

error dynamics

large language models

Transformer

in-context learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

contextual information

error dynamics

Transformer-based LLMs