Context as a Tool: Context Management for Long-Horizon SWE-Agents

📅 2025-12-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address semantic drift, reasoning degradation, and context explosion in long-horizon software engineering agents operating over ultra-large-scale codebases—stemming from uncontrolled context growth—this paper introduces the “Context-as-Tool” (CAT) paradigm, which explicitly models context management as callable, learnable tools. Methodologically: (1) we construct a structured workspace that decouples high-fidelity short-term interactions from compressed long-term memory; (2) we design CAT-GENERATOR, a trajectory-level supervised framework enabling milestone-driven proactive compression; and (3) we develop SWE-Compressor, a context-aware compression model. Evaluated on SWE-Bench-Verified, our approach achieves a 57.6% task success rate—significantly outperforming ReAct baselines and static compression methods—while ensuring robustness and scalability of long-range reasoning under fixed context budgets.

Technology Category

Application Category

📝 Abstract
Agents based on large language models have recently shown strong potential on real-world software engineering (SWE) tasks that require long-horizon interaction with repository-scale codebases. However, most existing agents rely on append-only context maintenance or passively triggered compression heuristics, which often lead to context explosion, semantic drift, and degraded reasoning in long-running interactions. We propose CAT, a new context management paradigm that elevates context maintenance to a callable tool integrated into the decision-making process of agents. CAT formalizes a structured context workspace consisting of stable task semantics, condensed long-term memory, and high-fidelity short-term interactions, and enables agents to proactively compress historical trajectories into actionable summaries at appropriate milestones. To support context management for SWE-agents, we propose a trajectory-level supervision framework, CAT-GENERATOR, based on an offline data construction pipeline that injects context-management actions into complete interaction trajectories. Using this framework, we train a context-aware model, SWE-Compressor. Experiments on SWE-Bench-Verified demonstrate that SWE-Compressor reaches a 57.6% solved rate and significantly outperforms ReAct-based agents and static compression baselines, while maintaining stable and scalable long-horizon reasoning under a bounded context budget.
Problem

Research questions and friction points this paper is trying to address.

Addresses context explosion in long-horizon software engineering agents
Proposes proactive context compression to maintain stable reasoning
Enables scalable agent interactions with repository-scale codebases
Innovation

Methods, ideas, or system contributions that make the work stand out.

Context management as callable tool for agents
Structured workspace with proactive compression at milestones
Offline supervision framework trains context-aware compressor model
🔎 Similar Papers
No similar papers found.