Agentic AI Workload Characteristics

📅 2026-05-25

📈 Citations: 0

✨ Influential: 0

career value

204K/year

🤖 AI Summary

Traditional large language model serving architectures struggle to efficiently support agentic AI workloads characterized by statefulness, multi-turn interactions, and tool invocation. This work constructs an end-to-end tracing system to systematically characterize LLM invocation and tool execution behaviors of ReAct-style agents under both inference and non-inference configurations—the first such analysis to date. The study reveals that agentic workloads are not merely long-prompt scenarios; instead, they are decode-dominated, heavily reliant on persistent key-value (KV) caches, and exhibit a phased tool usage pattern transitioning from exploration to execution. Furthermore, a significant portion of input tokens is reused across multiple turns. These findings provide crucial insights and concrete optimization directions for designing efficient serving systems tailored to agentic AI applications.

📝 Abstract

Agentic AI shifts LLM serving from isolated prompt-generation requests to stateful, multi-turn executions that repeatedly invoke the model, call tools, and grow context over time. This paper characterizes ReAct-style agents from both the LLM-serving and tool-execution perspectives using an end-to-end tracing infrastructure across reasoning and non-reasoning Gemma and Qwen configurations on five agentic benchmarks. Our study shows that agentic workloads are not simply long-prompt workloads: with effective context caching, most input tokens are reused across turns, making execution decode-dominated while increasing dependence on long-lived KV-cache state. We also find that tool use has a clear temporal structure, with agents shifting from read/explore behavior early in execution to execute/write behavior later. These results show that efficient agentic serving must jointly manage repeated model re-entry, persistent context state, and workload-dependent tool behavior.

Problem

Research questions and friction points this paper is trying to address.

Agentic AI

LLM serving

stateful execution

context management

tool use

Innovation

Methods, ideas, or system contributions that make the work stand out.

Agentic AI

LLM serving

context caching