LLMSched: Uncertainty-Aware Workload Scheduling for Compound LLM Applications

📅 2025-04-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address inefficient service scheduling in composite large language model (LLM) applications—caused by inherent execution-time variability and structural uncertainty—this paper proposes a DAG-based uncertainty modeling and co-optimization framework. The method innovatively integrates Bayesian networks to model dynamic task dependencies, employs information entropy to quantify node-level uncertainty, and introduces an entropy-driven uncertainty reduction identification mechanism alongside an uncertainty-aware scheduling algorithm. Crucially, the framework jointly optimizes job completion time (JCT). Extensive evaluations on both simulation and real-world LLM serving platforms demonstrate that our approach reduces average JCT by 14%–79% over state-of-the-art (SOTA) methods, significantly improving scheduling efficiency and robustness for composite LLM workloads.

Technology Category

Application Category

📝 Abstract
Developing compound Large Language Model (LLM) applications is becoming an increasingly prevalent approach to solving real-world problems. In these applications, an LLM collaborates with various external modules, including APIs and even other LLMs, to realize complex intelligent services. However, we reveal that the intrinsic duration and structural uncertainty in compound LLM applications pose great challenges for LLM service providers in serving and scheduling them efficiently. In this paper, we propose LLMSched, an uncertainty-aware scheduling framework for emerging compound LLM applications. In LLMSched, we first design a novel DAG-based model to describe the uncertain compound LLM applications. Then, we adopt the Bayesian network to comprehensively profile compound LLM applications and identify uncertainty-reducing stages, along with an entropy-based mechanism to quantify their uncertainty reduction. Combining an uncertainty reduction strategy and a job completion time (JCT)-efficient scheme, we further propose an efficient scheduler to reduce the average JCT. Evaluation of both simulation and testbed experiments on various representative compound LLM applications shows that compared to existing state-of-the-art scheduling schemes, LLMSched can reduce the average JCT by 14~79%.
Problem

Research questions and friction points this paper is trying to address.

Addressing scheduling challenges in compound LLM applications
Modeling uncertainty in LLM workflows using DAGs
Reducing job completion time via uncertainty-aware scheduling
Innovation

Methods, ideas, or system contributions that make the work stand out.

DAG-based model for uncertain LLM applications
Bayesian network profiling for uncertainty reduction
Entropy-based mechanism quantifying uncertainty reduction
🔎 Similar Papers
No similar papers found.
B
Botao Zhu
UM-SJTU Joint Institute, Shanghai Jiao Tong University, Shanghai, China
C
Chen Chen
John Hopcroft Center for Computer Science, Shanghai Jiao Tong University, Shanghai, China
Xiaoyi Fan
Xiaoyi Fan
Unknown affiliation
Yifei Zhu
Yifei Zhu
Shanghai Jiao Tong University
Edge computingmultimedia networkingdistributed ML systems