🤖 AI Summary
This work addresses the high API costs and latency incurred by large language models in long-context encoding tasks, where existing compression methods often lack task awareness and risk disrupting code structure or discarding critical information. Inspired by human programmers’ “selective skimming” behavior, the authors propose an adaptive context pruning framework that introduces, for the first time, an explicit task-aware target prompting mechanism. This mechanism guides a lightweight (0.6B-parameter) neural skimmer to dynamically evaluate and retain only the most relevant code lines. Evaluated on four benchmarks including SWE-Bench Verified, the method achieves token compression rates of 23%–54%, with up to a 14.84× reduction on the LongCodeQA task, while incurring minimal performance degradation—demonstrating a strong balance between compression efficiency and semantic fidelity.
📝 Abstract
LLM agents have demonstrated remarkable capabilities in software development, but their performance is hampered by long interaction contexts, which incur high API costs and latency. While various context compression approaches such as LongLLMLingua have emerged to tackle this challenge, they typically rely on fixed metrics such as PPL, ignoring the task-specific nature of code understanding. As a result, they frequently disrupt syntactic and logical structure and fail to retain critical implementation details. In this paper, we propose SWE-Pruner, a self-adaptive context pruning framework tailored for coding agents. Drawing inspiration from how human programmers"selectively skim"source code during development and debugging, SWE-Pruner performs task-aware adaptive pruning for long contexts. Given the current task, the agent formulates an explicit goal (e.g.,"focus on error handling") as a hint to guide the pruning targets. A lightweight neural skimmer (0.6B parameters) is trained to dynamically select relevant lines from the surrounding context given the goal. Evaluations across four benchmarks and multiple models validate SWE-Pruner's effectiveness in various scenarios, achieving 23-54% token reduction on agent tasks like SWE-Bench Verified while even improving success rates, and up to 14.84x compression on single-turn tasks like LongCodeQA with minimal performance impact.