The Context Gathering Decision Process: A POMDP Framework for Agentic Search

πŸ“… 2026-05-07
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

193K/year
πŸ€– AI Summary
This work addresses the limitations of large language model (LLM) agents in complex environments, where constrained context windows often lead to redundant exploration or premature termination. The authors formalize agent search as a Context-Gathering Decision Process (CGDP)β€”a specialized form of partially observable Markov decision process (POMDP)β€”and introduce a predicate-based modular belief state representation coupled with a procedural exhaustion detection mechanism. This approach enables non-intrusive, plug-and-play search augmentation by integrating POMDP theory, Thompson Sampling approximation, and predicate logic decomposition. Evaluated on three multi-hop question-answering tasks, the method achieves up to an 11.4% improvement in reasoning performance while reducing token consumption by as much as 39%, all without compromising accuracy.
πŸ“ Abstract
Large Language Model (LLM) agents are deployed in complex environments -- such as massive codebases, enterprise databases, and conversational histories -- where the relevant state far exceeds their context windows. To navigate these spaces, an agent must iteratively explore the environment to find relevant information. However, without explicit infrastructure, an agent's working memory can degrade into lossy representations of the search state, resulting in redundant work (e.g. repetitive looping) and premature stopping. In this work, we formalize this challenge as the Context Gathering Decision Process (CGDP), a specialized Partially Observable Markov Decision Process, where an agent's objective is to adaptively refine its belief state to isolate the necessary information for a task. We model an LLM's behavior as approximate Thompson Sampling within this CGDP, and introduce a predicate-based method that decomposes an LLM's implicit search into explicit and modular operations. We then derive two plug-and-play interventions for iterative LLM agents: a persistent, predicate-based belief state that bounds context while preserving multi-hop reasoning, and a programmatic exhaustion gate that halts unproductive search without premature stopping. Across four methods and three question-answering domains, we empirically validate that replacing an LLM's implicit state with our CGDP-motivated belief state improves multi-hop reasoning by up to $11.4\%$; while the modular programmatic exhaustion detection saves up to $39\%$ of tokens without any degradation in agent performance. Ultimately, we argue that framing the LLM agent loop as a CGDP can guide the design of modular, non-interfering improvements to agentic search harnesses.
Problem

Research questions and friction points this paper is trying to address.

Context Gathering
LLM Agents
Partially Observable Markov Decision Process
Agentic Search
Working Memory
Innovation

Methods, ideas, or system contributions that make the work stand out.

Context Gathering Decision Process
Partially Observable Markov Decision Process
LLM agents
predicate-based belief state
programmatic exhaustion gate