🤖 AI Summary
Current cloud-based agent workflows suffer from low service efficiency and poor resource utilization: their logic is tightly coupled with underlying models and hardware, and their distributed components hinder joint optimization across accuracy, latency, energy consumption, and cost. This paper proposes an efficient cloud service system for agent workflows, whose core innovation is the decoupling of declarative workflow specifications from execution configurations, enabling a full-stack co-optimization framework. The framework comprises a performance-profiling–driven optimizer and an adaptive runtime that supports dynamic reconfiguration to meet user-defined service-level objectives (SLOs). Experimental evaluation demonstrates that, while maintaining stringent quality-of-service guarantees, the system reduces GPU usage by 2.8×, energy consumption by 3.7×, and operational cost by 4.3×.
📝 Abstract
Agentic workflows commonly coordinate multiple models and tools with complex control logic. They are quickly becoming the dominant paradigm for AI applications. However, serving them remains inefficient with today's frameworks. The key problem is that they expose workflows as opaque sequences of model and tool calls that tightly couple agent logic with model and hardware choices. Often, these workflow components are fragmented across different entities, preventing systems from reasoning about trade-offs across accuracy, latency, energy, and cost. This leads to resource waste and degraded service-level objectives (SLOs).
We present Murakkab, a resource-efficient serving system for agentic workflows. Murakkab introduces a declarative abstraction that decouples workflow specification from execution configuration. A profile-guided optimizer and adaptive runtime jointly manage the full stack: orchestrating workflow components, mapping them to models and hardware, and dynamically reconfiguring execution to satisfy user-defined SLOs. By exposing the internal structure of agentic workflows, Murakkab enables cross-layer optimization that existing frameworks and cloud schedulers cannot achieve.
Our evaluation on diverse workflows shows that sysname{} reduces GPU usage by up to 2.8$ imes$, energy consumption by 3.7$ imes$, and cost by 4.3$ imes$ while maintaining SLOs.