Tokenomics: Quantifying Where Tokens Are Used in Agentic Software Engineering

📅 2026-01-20

📈 Citations: 1

✨ Influential: 0

career value

202K/year

🤖 AI Summary

This study addresses the limitations of large language model (LLM)-based multi-agent systems in software engineering, particularly the lack of transparency in resource consumption, unpredictability of costs, and unclear environmental impact. To this end, it introduces the first standardized token consumption evaluation framework tailored for agent-based software engineering. By analyzing execution trajectories from the ChatDev framework across 30 development tasks, the work maps internal agent interactions to standard software engineering phases—design, coding, completion, code review, testing, and documentation—and quantifies the distribution of input, output, and reasoning tokens across these stages. The analysis reveals that the code review phase alone accounts for 59.4% of total token usage, with input tokens comprising 53.9% of the total, indicating that cost is primarily driven by automated refinement and validation rather than initial code generation. These findings provide empirical foundations for optimizing workflows, forecasting costs, and designing efficient agent collaboration protocols.

Technology Category

Application Category

📝 Abstract

LLM-based Multi-Agent (LLM-MA) systems are increasingly applied to automate complex software engineering tasks such as requirements engineering, code generation, and testing. However, their operational efficiency and resource consumption remain poorly understood, hindering practical adoption due to unpredictable costs and environmental impact. To address this, we conduct an analysis of token consumption patterns in an LLM-MA system within the Software Development Life Cycle (SDLC), aiming to understand where tokens are consumed across distinct software engineering activities. We analyze execution traces from 30 software development tasks performed by the ChatDev framework using a GPT-5 reasoning model, mapping its internal phases to distinct development stages (Design, Coding, Code Completion, Code Review, Testing, and Documentation) to create a standardized evaluation framework. We then quantify and compare token distribution (input, output, reasoning) across these stages. Our preliminary findings show that the iterative Code Review stage accounts for the majority of token consumption for an average of 59.4% of tokens. Furthermore, we observe that input tokens consistently constitute the largest share of consumption for an average of 53.9%, providing empirical evidence for potentially significant inefficiencies in agentic collaboration. Our results suggest that the primary cost of agentic software engineering lies not in initial code generation but in automated refinement and verification. Our novel methodology can help practitioners predict expenses and optimize workflows, and it directs future research toward developing more token-efficient agent collaboration protocols.

Problem

Research questions and friction points this paper is trying to address.

Tokenomics

LLM-based Multi-Agent

Software Engineering

Resource Consumption

Token Efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

Tokenomics

LLM-based Multi-Agent Systems

Software Development Life Cycle