The Complexity Trap: Simple Observation Masking Is as Efficient as LLM Summarization for Agent Context Management

📅 2025-08-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language model (LLM) agents incur high computational cost and reduced efficiency in complex software engineering (SE) tasks due to excessively long input contexts. Method: This work systematically compares observation masking—a lightweight, rule-based context compression strategy—with mainstream LLM-driven summarization approaches. We implement a configurable observation masking mechanism within the SWE-agent framework to selectively filter historical observations and evaluate it across multiple models on the SWE-bench Verified benchmark. Contribution/Results: Observation masking halves context-related inference cost while improving task success rate from 53.8% to 54.8% on Qwen3-Coder 480B—matching or slightly exceeding the performance of LLM summarization baselines. Our findings demonstrate that observation masking offers superior efficiency, robustness, and practicality for context management in SE agents, establishing a more effective paradigm for real-world agent design.

Technology Category

Application Category

📝 Abstract
Large Language Model (LLM)-based agents solve complex tasks through iterative reasoning, exploration, and tool-use, a process that can result in long, expensive context histories. While state-of-the-art Software Engineering ( SE) agents like OpenHands or Cursor use LLM-based summarization to tackle this issue, it is unclear whether the increased complexity offers tangible performance benefits compared to simply omitting older observations. We present a systematic comparison of these strategies within SWE-agent on SWE-bench Verified across five diverse model configurations. We find that a simple observation-masking strategy halves cost relative to a raw agent while matching, and sometimes slightly exceeding, the solve rate of LLM summarization. For example, with Qwen3-Coder 480B, masking improves solve rate from 53.8% (raw agent) to 54.8%, while remaining competitive with summarization at a lower cost. These results suggest that, at least within SWE-agent on SWE-bench Verified, the most effective and efficient context management can be the simplest. We release code and data for reproducibility
Problem

Research questions and friction points this paper is trying to address.

Compares LLM summarization vs simple masking for agent context management
Evaluates cost and performance trade-offs in software engineering agents
Tests strategies on SWE-bench with multiple model configurations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Simple observation masking halves cost
Masking matches LLM summarization solve rates
Effective context management can be simplest
T
Tobias Lindenbauer
JetBrains Research, School of Computation, Information and Technology, Technical University of Munich
I
Igor Slinko
JetBrains Research
L
Ludwig Felder
School of Computation, Information and Technology, Technical University of Munich
Egor Bogomolov
Egor Bogomolov
JetBrains Research
machine learning for software engineering
Yaroslav Zharov
Yaroslav Zharov
Researcher @ JetBrains
Deep LearningAI Agents