🤖 AI Summary
This work addresses the significant energy overhead in multi-agent software engineering systems caused by redundant output tokens during repeated codebase exploration. It reveals, for the first time, that output tokens incur substantially higher energy consumption than input or cached tokens. To mitigate this inefficiency, the authors propose the Librarian mechanism, which maintains a persistent memory of cross-agent search history and replaces full file snippets with lightweight references to suppress redundant outputs. Evaluated on the SWE-Bench Verified benchmark, this approach reduces per-turn GPU energy consumption by up to 25% without compromising task performance. The core contributions include a history-aware redundancy suppression mechanism, fine-grained energy attribution analysis, and an efficient reference generation strategy.
📝 Abstract
Multi-agent systems (MAS) have substantially advanced autonomous software engineering (SWE), but their growing inference energy demands raise sustainability concerns. In this paper, we demonstrate that this cost is concentrated in an overlooked source: redundant output tokens generated across agents. Two empirical findings ground this claim. First, our per-token energy attribution for MAS reveals a sharp asymmetry: an output token consumes 30 to 1,000 times more energy than an input or cached token. Second, MAS inflate per-episode output because agents repeatedly re-explore overlapping repository regions. To address this inefficiency, we propose Librarian, a persistent search sub-agent that tracks repository-search history and suppresses redundant exploration actions across agents. By returning short references to file regions instead of full file excerpts, Librarian further reduces output-token volume. On SWE-Bench Verified, Librarian reduces per-episode GPU energy consumption of existing multi-agent SWE systems by up to 25% while preserving task performance.