Cortex: Workflow-Aware Resource Pooling and Scheduling for Agentic Serving

📅 2025-10-15

📈 Citations: 0

✨ Influential: 0

career value

223K/year

🤖 AI Summary

In agent workflow services, computational and memory interference across stages leads to low KV cache utilization, constrained throughput, and unpredictable performance. To address this, we propose Cortex—the first workflow-aware, isolation-oriented service architecture for agent workloads. Its core contributions are: (1) a stage-aware resource pooling and isolation mechanism that allocates dedicated CPU and memory resources to each workflow stage; (2) an elastic “agent-state” cache supporting speculative branch execution, dynamic scheduling, and multi-level sharing; and (3) tight co-optimization of workflow-aware scheduling and KV caching. Evaluation shows Cortex improves throughput by up to 2.3×, increases KV cache hit rate by 41%, and significantly reduces tail latency variability—delivering both high efficiency and strong performance determinism for complex agent applications.

Technology Category

Application Category

📝 Abstract

We introduce Cortex, a prototype workflow-aware serving platform designed for agentic workloads. The core principle of Cortex is stage isolation: it provisions dedicated resource pools for each distinct stage of an agentic workflow. This simple yet powerful strategy mitigates inter-stage interference in compute and memory, leading to better KV cache utilization, higher throughput, and more predictable performance. By customizing resource allocation and scheduling within each distinct stage of agentic workflows, Cortex lays the groundwork for more advanced, agent-native serving paradigms, including malleable resource management, speculative execution of workflow branches, and a shared, multi-tiered cache for "agentic state."

Problem

Research questions and friction points this paper is trying to address.

Mitigates inter-stage interference in agentic workflows

Enhances KV cache utilization and throughput performance

Enables advanced agent-native serving paradigms

Innovation

Methods, ideas, or system contributions that make the work stand out.

Stage isolation for dedicated resource pools

Customized resource allocation per workflow stage

Multi-tiered cache for agentic state sharing

🔎 Similar Papers

No similar papers found.