🤖 AI Summary
Existing agent-based AI workflow frameworks struggle to balance scalability and reproducibility due to fragmented data orchestration, high serialization overhead, and non-deterministic execution. This work proposes modeling agent workflows as operator abstractions and introduces a unified distributed runtime that enables efficient interoperability among preprocessing, embedding, and vector retrieval through a zero-copy data plane built on Apache Arrow and Cylon. By incorporating resource-deterministic scheduling and asynchronous batching, the system achieves, for the first time, deterministic and scalable execution of agent workflows under high-performance computing paradigms. Experimental results demonstrate that, while maintaining comparable large language model generation throughput, the system attains up to a 4.64× pipeline speedup and a 2.8× improvement in embedding write performance.
📝 Abstract
Agentic workflows in large language model systems integrate retrieval, reasoning, and memory, but existing frameworks suffer from scalability and reproducibility limitations due to fragmented data orchestration, serialization overhead, and non-deterministic execution. Although these frameworks increase flexibility, they don't have a formal execution model that adheres to the principles of high-performance computing. We introduce AAFLOW, a unified distributed runtime that creates communication-efficient execution plans by modeling agentic workflows as an operator abstraction. Using Apache Arrow and Cylon, AAFLOW creates a zero-copy data plane that allows direct interoperability between preprocessing, embedding, and vector retrieval without the need for serialization overhead. To lower coordination costs, it uses resource-deterministic scheduling and asynchronous batching. While retaining comparable LLM generation throughput, experimental results demonstrate up to 4.64 times pipeline speedup and 2.8 times gains in embedding and upsert phases. Rather than LLM inference acceleration, these advantages result from enhanced data flow, batching, and communication efficiency.