Energy per Successful Goal: Goal-Level Energy Accounting for Agentic AI Systems

📅 2026-05-20

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

This study addresses the limitations of existing AI energy consumption assessments, which typically focus on single inference or training runs and fail to capture the real-world energy dynamics of goal-oriented agent systems involving multi-step execution, retries, and recovery. To bridge this gap, the authors propose the A-LEMS framework, introducing two novel metrics: Energy per successful Goal (EpG) and Orchestration Overhead Index (OOI). A-LEMS integrates a cross-layer observation pipeline with a time-bounded attribution model to enable end-to-end, reproducible energy evaluation. Experimental results demonstrate that agent workflows incur an average EpG of 888.1 joules—4.33 times higher than that of linear baselines—while achieving OOI values below 1.0 in tool-augmented tasks, confirming EpG’s sensitivity and effectiveness in reflecting the energy impact of orchestration structures.

📝 Abstract

Current AI energy benchmarks measure consumption at the granularity of a single model invocation or training run. For classical single-turn workloads this unit remains coherent. For agentic systems - where a single user goal may trigger multi-step orchestration, tool calls, retries, and failure-recovery cycles - the invocation count is an implementation artifact rather than a task property, and inference-level normalization misrepresents the energy cost of goal completion. We present A-LEMS (Agentic LLM Energy Measurement System), a cross-layer measurement framework that redefines the unit of AI energy accounting from energy per inference to Energy per Successful Goal (EpG). EpG aggregates total workflow energy across all execution attempts, including failures and retries, normalized by successfully completed goals. A-LEMS formalizes energy attribution through a temporal boundary model, a five-layer observation pipeline mapping RAPL signals to workflow-level energy, and a reproducibility protocol binding every measurement to hardware and runtime configuration. Building on EpG, we define the Orchestration Overhead Index (OOI), isolating the energy cost of orchestration relative to linear execution under identical task criteria. Across five reasoning and three tool-augmented task families, agentic workflows consume 4.33x higher mean energy per successful goal than linear baselines (888.1 J vs 205.3 J). This overhead is driven by orchestration structure, not inference compute. For tool-augmented tasks, OOI inverts below 1.0x: agentic execution is cheaper than linear, confirming the metric captures orchestration structure rather than a fixed upward bias. These findings establish that energy-per-inference is insufficient for agentic AI. EpG and OOI provide the measurement foundation for accurate benchmarking, where orchestration structure is the primary determinant of energy cost.

Problem

Research questions and friction points this paper is trying to address.

Agentic AI

Energy Accounting

Goal Completion

Orchestration Overhead

Energy Benchmarking

Innovation

Methods, ideas, or system contributions that make the work stand out.

Energy per Successful Goal

Agentic AI

Orchestration Overhead Index