A CPU-Centric Perspective on Agentic AI

📅 2025-11-01

📈 Citations: 0

✨ Influential: 0

career value

227K/year

🤖 AI Summary

This study identifies critical CPU-side bottlenecks in Agentic AI workloads: tool invocation latency accounts for up to 90.6% of end-to-end latency, and dynamic CPU energy consumption constitutes 44% of total system energy. Motivated by the characteristics of Agentic AI—complex decision orchestration, dynamic inference paths, and high procedural redundancy—we propose two novel optimizations: CGAM (CPU-GPU Collaborative Micro-batching) for homogeneous workloads and MAWS (Mixed Adaptive Work Scheduling) for heterogeneous workloads, enabling fine-grained CPU-GPU co-execution. Evaluated across five representative task categories—Haystack RAG, Toolformer, ChemCrow, LangChain, and SWE-Agent—the system-level measurements demonstrate up to a 2.1× reduction in P50 latency, alongside substantial improvements in throughput and energy efficiency. To our knowledge, this is the first work to systematically characterize and alleviate system-level performance bottlenecks in Agentic AI from a CPU-centric perspective.

Technology Category

Application Category

📝 Abstract

Agentic AI frameworks add a decision-making orchestrator embedded with external tools, including web search, Python interpreter, contextual database, and others, on top of monolithic LLMs, turning them from passive text oracles into autonomous problem-solvers that can plan, call tools, remember past steps, and adapt on the fly. This paper aims to characterize and understand the system bottlenecks introduced by agentic AI workloads from a largely overlooked CPU-centric perspective. We first systematically characterize Agentic AI on the basis of orchestrator/decision making component, inference path dynamics and repetitiveness of the agentic flow which directly influences the system-level performance. Thereafter, based on the characterization, we choose five representative agentic AI workloads- Haystack RAG, Toolformer, ChemCrow, Langchain and SWE-Agent to profile latency, throughput and energy metrics and demystify the significant impact of CPUs on these metrics relative to GPUs. We observe that - 1. Tool processing on CPUs can take up to 90.6% of the total latency; 2. Agentic throughput gets bottlenecked either by CPU factors - coherence, synchronization and over-subscription of cores or GPU factors - main memory capacity and bandwidth; circled{3} CPU dynamic energy consumes up to 44% of the total dynamic energy at large batch sizes. Based on the profiling insights, we present two key optimizations- 1. CPU and GPU-Aware Micro-batching (CGAM) and 2. Mixed Agentic Workload Scheduling (MAWS) for homogeneous and heterogeneous agentic workloads respectively to demonstrate the potential to improve the performance, efficiency, and scalability of agentic AI. We achieve up to 2.1x and 1.41x P50 latency speedup compared to the multi-processing benchmark for homogeneous and heterogeneous agentic workloads respectively.

Problem

Research questions and friction points this paper is trying to address.

Analyzing CPU bottlenecks in agentic AI system performance

Profiling latency and energy impacts of CPU-centric workloads

Optimizing CPU-GPU coordination for agentic AI efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

CPU-GPU micro-batching optimizes latency and throughput

Mixed workload scheduling improves heterogeneous agentic efficiency

CPU-centric profiling reveals tool processing as major bottleneck

🔎 Similar Papers

Large Model Based Agents: State-of-the-Art, Cooperation Paradigms, Security and Privacy, and Future Trends