Cortex: Workflow-Aware Resource Pooling and Scheduling for Agentic Serving

📅 2025-10-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In agent workflow services, computational and memory interference across stages leads to low KV cache utilization, constrained throughput, and unpredictable performance. To address this, we propose Cortex—the first workflow-aware, isolation-oriented service architecture for agent workloads. Its core contributions are: (1) a stage-aware resource pooling and isolation mechanism that allocates dedicated CPU and memory resources to each workflow stage; (2) an elastic “agent-state” cache supporting speculative branch execution, dynamic scheduling, and multi-level sharing; and (3) tight co-optimization of workflow-aware scheduling and KV caching. Evaluation shows Cortex improves throughput by up to 2.3×, increases KV cache hit rate by 41%, and significantly reduces tail latency variability—delivering both high efficiency and strong performance determinism for complex agent applications.

Technology Category

Application Category

📝 Abstract
We introduce Cortex, a prototype workflow-aware serving platform designed for agentic workloads. The core principle of Cortex is stage isolation: it provisions dedicated resource pools for each distinct stage of an agentic workflow. This simple yet powerful strategy mitigates inter-stage interference in compute and memory, leading to better KV cache utilization, higher throughput, and more predictable performance. By customizing resource allocation and scheduling within each distinct stage of agentic workflows, Cortex lays the groundwork for more advanced, agent-native serving paradigms, including malleable resource management, speculative execution of workflow branches, and a shared, multi-tiered cache for "agentic state."
Problem

Research questions and friction points this paper is trying to address.

Mitigates inter-stage interference in agentic workflows
Enhances KV cache utilization and throughput performance
Enables advanced agent-native serving paradigms
Innovation

Methods, ideas, or system contributions that make the work stand out.

Stage isolation for dedicated resource pools
Customized resource allocation per workflow stage
Multi-tiered cache for agentic state sharing
🔎 Similar Papers
No similar papers found.
N
Nikos Pagonas
Columbia University
Yeounoh Chung
Yeounoh Chung
Google
MLGen AIdata managementdata analyticsdatabase
K
Kostis Kaffes
Columbia University
A
Arvind Krishnamurthy
Google & Univ. of Washington