🤖 AI Summary
To address the fundamental challenge of context-length limitations hindering large language model (LLM) agents in long-horizon tasks, this paper proposes Context-Folding—a framework for dynamic, efficient context compression via learnable subtask decomposition and execution-path folding. Methodologically, it introduces FoldGRPO, the first end-to-end reinforcement learning algorithm that jointly optimizes task decomposition and folding policies using process-based rewards, moving beyond static, summary-driven compression. Evaluated on complex long-horizon benchmarks—including Deep Research and SWE—the approach matches or exceeds ReAct’s task performance while reducing active context length by up to 10×. This substantial compression significantly outperforms existing summarization-based context management techniques, demonstrating improved scalability and fidelity in extended reasoning trajectories.
📝 Abstract
Large language model (LLM) agents are fundamentally constrained by context length on long-horizon tasks. We introduce Context-Folding, a framework that empowers agents to actively manage their working context. An agent can procedurally branch into a sub-trajectory to handle a subtask and then fold it upon completion, collapsing the intermediate steps while retaining a concise summary of the outcome. To make this behavior learnable, we develop an end-to-end reinforcement learning framework FoldGRPO with specific process rewards to encourage effective task decomposition and context management. On complex long-horizon tasks (Deep Research and SWE), our folding agent matches or outperforms the ReAct baselines while using an active context 10$ imes$ smaller and significantly outperforms models that rely on summarization-based context management.