A Self-Evolving Framework for Efficient Terminal Agents via Observational Context Compression

πŸ“… 2026-04-21
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

206K/year
πŸ€– AI Summary
This work addresses the challenge of quadratic growth in inference overhead caused by redundant interaction history in long-horizon terminal-based agent tasks, which severely limits sustained decision-making capabilities. To this end, we propose TACO, a novel framework that introduces, for the first time, a self-evolving, task-aware context compression mechanism. TACO automatically learns and optimizes compression rules directly from agent trajectories, enabling efficient condensation of observational context without relying on handcrafted heuristics or fixed prompts. The method is plug-and-play and compatible with mainstream agent architectures and large language models. Evaluated on benchmarks such as TerminalBench, TACO consistently improves accuracy by 1%–4%, achieves an additional 2%–3% gain under identical token budgets, and reduces token consumption by approximately 10%.

Technology Category

Application Category

πŸ“ Abstract
As model capabilities advance, research has increasingly shifted toward long-horizon, multi-turn terminal-centric agentic tasks, where raw environment feedback is often preserved in the interaction history to support future decisions. However, repeatedly retaining such feedback introduces substantial redundancy and causes cumulative token cost to grow quadratically with the number of steps, hindering long-horizon reasoning. Although observation compression can mitigate this issue, the heterogeneity of terminal environments makes heuristic-based or fixed-prompt methods difficult to generalize. We propose TACO, a plug-and-play, self-evolving Terminal Agent Compression framework that automatically discovers and refines compression rules from interaction trajectories for existing terminal agents. Experiments on TerminalBench (TB 1.0 and TB 2.0) and four additional terminal-related benchmarks (i.e., SWE-Bench Lite, CompileBench, DevEval, and CRUST-Bench) show that TACO consistently improves performance across mainstream agent frameworks and strong backbone models. With MiniMax-2.5, it improves performance on most benchmarks while reducing token overhead by around 10%. On TerminalBench, it brings consistent gains of 1%-4% across strong agentic models, and further improves accuracy by around 2%-3% under the same token budget. These results demonstrate the effectiveness and generalization of self-evolving, task-aware compression for terminal agents.
Problem

Research questions and friction points this paper is trying to address.

terminal agents
long-horizon reasoning
observation compression
token cost
environment heterogeneity
Innovation

Methods, ideas, or system contributions that make the work stand out.

self-evolving
observation compression
terminal agents
token efficiency
long-horizon reasoning
πŸ”Ž Similar Papers