🤖 AI Summary
Large language models (LLMs) commonly suffer from context drift in multi-turn interactions—i.e., progressive deviation of outputs from the user’s original intent across turns. Method: This work introduces the first formal modeling and interpretable dynamical analysis framework for this phenomenon. Departing from static single-turn evaluation, the authors define drift as the evolution of the per-turn KL divergence between the test model and a target reference model. They prove that drift dynamics constitute a resilient, bounded stochastic system converging to a stable equilibrium under noise constraints—not inevitable degradation. A recursive dynamical model is proposed, incorporating an intervenable prompting mechanism. Results: Empirical validation on synthetic rewriting tasks and τ-Bench user-agent simulations confirms that mainstream open-source LLMs exhibit such equilibria; lightweight interventions significantly suppress drift, with strong alignment between theoretical predictions and experimental outcomes.
📝 Abstract
Large Language Models (LLMs) excel at single-turn tasks such as instruction following and summarization, yet real-world deployments require sustained multi-turn interactions where user goals and conversational context persist and evolve. A recurring challenge in this setting is context drift: the gradual divergence of a model's outputs from goal-consistent behavior across turns. Unlike single-turn errors, drift unfolds temporally and is poorly captured by static evaluation metrics. In this work, we present a study of context drift in multi-turn interactions and propose a simple dynamical framework to interpret its behavior. We formalize drift as the turn-wise KL divergence between the token-level predictive distributions of the test model and a goal-consistent reference model, and propose a recurrence model that interprets its evolution as a bounded stochastic process with restoring forces and controllable interventions. We instantiate this framework in both synthetic long-horizon rewriting tasks and realistic user-agent simulations such as in $τ$-Bench, measuring drift for several open-weight LLMs that are used as user simulators. Our experiments consistently reveal stable, noise-limited equilibria rather than runaway degradation, and demonstrate that simple reminder interventions reliably reduce divergence in line with theoretical predictions. Together, these results suggest that multi-turn drift can be understood as a controllable equilibrium phenomenon rather than as inevitable decay, providing a foundation for studying and mitigating context drift in extended interactions.