Escaping the Context Bottleneck: Active Context Curation for LLM Agents via Reinforcement Learning

📅 2026-04-13
📈 Citations: 0
Influential: 0
📄 PDF

career value

207K/year
🤖 AI Summary
This work addresses the performance degradation of large language models in long-horizon tasks due to context bottlenecks and information overload. The authors propose ContextCurator-TaskExecutor, a symbiotic architecture that decouples context management from task execution. A lightweight policy model, ContextCurator, is trained via reinforcement learning to actively compress working memory and preserve critical reasoning anchors, while a frozen base model, TaskExecutor, performs efficient inference. This approach enables a small policy model to effectively manage context for the first time, achieving significant gains: on WebArena, it boosts the success rate of Gemini-3.0-flash from 36.4% to 41.2% with an 8.8% reduction in token usage; on DeepSearch, it attains a 57.1% success rate and reduces token consumption by 8×, matching GPT-4o’s context management capability.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) struggle with long-horizon tasks due to the "context bottleneck" and the "lost-in-the-middle" phenomenon, where accumulated noise from verbose environments degrades reasoning over multi-turn interactions. To address this issue, we introduce a symbiotic framework that decouples context management from task execution. Our architecture pairs a lightweight, specialized policy model, ContextCurator, with a powerful frozen foundation model, TaskExecutor. Trained via reinforcement learning, ContextCurator actively reduces information entropy in the working memory. It aggressively prunes environmental noise while preserving reasoning anchors, that is, sparse data points that are critical for future deductions. On WebArena, our framework improves the success rate of Gemini-3.0-flash from 36.4% to 41.2% while reducing token consumption by 8.8% (from 47.4K to 43.3K). On DeepSearch, it achieves a 57.1% success rate, compared with 53.9%, while reducing token consumption by a factor of 8. Remarkably, a 7B ContextCurator matches the context management performance of GPT-4o, providing a scalable and computationally efficient paradigm for autonomous long-horizon agents.
Problem

Research questions and friction points this paper is trying to address.

context bottleneck
lost-in-the-middle
long-horizon tasks
information entropy
working memory
Innovation

Methods, ideas, or system contributions that make the work stand out.

context curation
reinforcement learning
large language models
information entropy reduction
long-horizon reasoning
🔎 Similar Papers
2024-01-29Conference on Empirical Methods in Natural Language ProcessingCitations: 3