🤖 AI Summary
Existing LLM application frameworks employ coarse-grained modular orchestration, optimizing only within individual components while neglecting coordinated scheduling across LLM and non-LLM components—leading to high end-to-end latency.
Method: We propose a primitive-level fine-grained dataflow modeling approach that decomposes tasks into unified primitive units and constructs a joint scheduling graph spanning both LLM and non-LLM components. We further design a dynamic pipeline orchestration engine with primitive-level parallelism and pipelining scheduling algorithms to achieve end-to-end co-optimization.
Contribution/Results: This work establishes the first primitive-level end-to-end orchestration paradigm, explicitly exposing broader optimization opportunities beyond isolated module throughput gains. Evaluated on diverse state-of-the-art LLM applications, our approach reduces end-to-end latency by up to 2.09× over SOTA systems, demonstrating both effectiveness and generality.
📝 Abstract
Large language model (LLM)-based applications consist of both LLM and non-LLM components, each contributing to the end-to-end latency. Despite great efforts to optimize LLM inference, end-to-end workflow optimization has been overlooked. Existing frameworks employ coarse-grained orchestration with task modules, which confines optimizations to within each module and yields suboptimal scheduling decisions. We propose fine-grained end-to-end orchestration, which utilizes task primitives as the basic units and represents each query's workflow as a primitive-level dataflow graph. This explicitly exposes a much larger design space, enables optimizations in parallelization and pipelining across primitives of different modules, and enhances scheduling to improve application-level performance. We build Teola, a novel orchestration framework for LLM-based applications that implements this scheme. Comprehensive experiments show that Teola can achieve up to 2.09x speedup over existing systems across various popular LLM applications.