🤖 AI Summary
LLM-based web agents face a fundamental trade-off in long-horizon tasks: accumulating raw interaction history leads to context saturation, whereas fixed global summarization causes irreversible loss of critical operational details. To address this, we propose AgentFold—a novel paradigm inspired by human retrospective cognitive consolidation—that enables dynamic, multi-granularity context folding. It preserves fine-grained action traces while abstracting multi-step subtask logic into higher-level representations. Built upon the ReAct framework, AgentFold introduces learnable folding operations via supervised fine-tuning, establishing a lightweight, dynamic cognitive workspace without requiring continual pretraining or reinforcement learning. Evaluated on BrowseComp and its Chinese counterpart BrowseComp-ZH, AgentFold-30B-A3B achieves 36.2% and 47.3% success rates, respectively—substantially outperforming open-source agents of comparable scale and closed-source baselines such as OpenAI o4-mini.
📝 Abstract
LLM-based web agents show immense promise for information seeking, yet their effectiveness on long-horizon tasks is hindered by a fundamental trade-off in context management. Prevailing ReAct-based agents suffer from context saturation as they accumulate noisy, raw histories, while methods that fixedly summarize the full history at each step risk the irreversible loss of critical details. Addressing these, we introduce AgentFold, a novel agent paradigm centered on proactive context management, inspired by the human cognitive process of retrospective consolidation. AgentFold treats its context as a dynamic cognitive workspace to be actively sculpted, rather than a passive log to be filled. At each step, it learns to execute a `folding'operation, which manages its historical trajectory at multiple scales: it can perform granular condensations to preserve vital, fine-grained details, or deep consolidations to abstract away entire multi-step sub-tasks. The results on prominent benchmarks are striking: with simple supervised fine-tuning (without continual pre-training or RL), our AgentFold-30B-A3B agent achieves 36.2% on BrowseComp and 47.3% on BrowseComp-ZH. Notably, this performance not only surpasses or matches open-source models of a dramatically larger scale, such as the DeepSeek-V3.1-671B-A37B, but also surpasses leading proprietary agents like OpenAI's o4-mini.