Tokenomics: Quantifying Where Tokens Are Used in Agentic Software Engineering

📅 2026-01-20
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the limitations of large language model (LLM)-based multi-agent systems in software engineering, particularly the lack of transparency in resource consumption, unpredictability of costs, and unclear environmental impact. To this end, it introduces the first standardized token consumption evaluation framework tailored for agent-based software engineering. By analyzing execution trajectories from the ChatDev framework across 30 development tasks, the work maps internal agent interactions to standard software engineering phases—design, coding, completion, code review, testing, and documentation—and quantifies the distribution of input, output, and reasoning tokens across these stages. The analysis reveals that the code review phase alone accounts for 59.4% of total token usage, with input tokens comprising 53.9% of the total, indicating that cost is primarily driven by automated refinement and validation rather than initial code generation. These findings provide empirical foundations for optimizing workflows, forecasting costs, and designing efficient agent collaboration protocols.

Technology Category

Application Category

📝 Abstract
LLM-based Multi-Agent (LLM-MA) systems are increasingly applied to automate complex software engineering tasks such as requirements engineering, code generation, and testing. However, their operational efficiency and resource consumption remain poorly understood, hindering practical adoption due to unpredictable costs and environmental impact. To address this, we conduct an analysis of token consumption patterns in an LLM-MA system within the Software Development Life Cycle (SDLC), aiming to understand where tokens are consumed across distinct software engineering activities. We analyze execution traces from 30 software development tasks performed by the ChatDev framework using a GPT-5 reasoning model, mapping its internal phases to distinct development stages (Design, Coding, Code Completion, Code Review, Testing, and Documentation) to create a standardized evaluation framework. We then quantify and compare token distribution (input, output, reasoning) across these stages. Our preliminary findings show that the iterative Code Review stage accounts for the majority of token consumption for an average of 59.4% of tokens. Furthermore, we observe that input tokens consistently constitute the largest share of consumption for an average of 53.9%, providing empirical evidence for potentially significant inefficiencies in agentic collaboration. Our results suggest that the primary cost of agentic software engineering lies not in initial code generation but in automated refinement and verification. Our novel methodology can help practitioners predict expenses and optimize workflows, and it directs future research toward developing more token-efficient agent collaboration protocols.
Problem

Research questions and friction points this paper is trying to address.

Tokenomics
LLM-based Multi-Agent
Software Engineering
Resource Consumption
Token Efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Tokenomics
LLM-based Multi-Agent Systems
Software Development Life Cycle
Token Consumption Analysis
Agentic Software Engineering
M
Mohamad Salim
Data-driven Analysis of Software (DAS) Lab, Concordia University, Montreal, Canada
Jasmine Latendresse
Jasmine Latendresse
Concordia University
software engineeringmining software repositories
S
S. Khatoonabadi
Data-driven Analysis of Software (DAS) Lab, Concordia University, Montreal, Canada
Emad Shihab
Emad Shihab
Professor at Concordia University
Software EngineeringSE4AIMining Software RepositoriesSoftware AnalyticsSoftware Supply Chain