SPARQ: An Optimization Framework for the Distribution of AI-Intensive Applications under Non-Linear Delay Constraints

📅 2025-09-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing edge-cloud distributed AI orchestration approaches struggle to model the strong nonlinearity between resource utilization and end-to-end latency, leading to latency violations and resource waste. To address this, we propose Q-Orchestrator, a queueing-delay-aware AI application orchestration framework. It is the first to jointly couple Generalized Resource (GR) and Service Routing (SR) execution models, integrating M/M/1 and M/G/1 queueing models to accurately capture the nonlinear dynamics of computation and communication latency. We formulate a joint service placement, routing, and resource allocation optimization problem with nonlinear latency constraints, and design SPARQ—an iterative approximation algorithm that decomposes the original non-convex problem into two efficiently solvable convex subproblems. Experiments demonstrate that Q-Orchestrator achieves 32% higher resource utilization while strictly satisfying stringent latency constraints, and improves the cost–latency trade-off by over 19% compared to state-of-the-art methods.

Technology Category

Application Category

📝 Abstract
Next-generation real-time compute-intensive applications, such as extended reality, multi-user gaming, and autonomous transportation, are increasingly composed of heterogeneous AI-intensive functions with diverse resource requirements and stringent latency constraints. While recent advances have enabled very efficient algorithms for joint service placement, routing, and resource allocation for increasingly complex applications, current models fail to capture the non-linear relationship between delay and resource usage that becomes especially relevant in AI-intensive workloads. In this paper, we extend the cloud network flow optimization framework to support queuing-delay-aware orchestration of distributed AI applications over edge-cloud infrastructures. We introduce two execution models, Guaranteed-Resource (GR) and Shared-Resource (SR), that more accurately capture how computation and communication delays emerge from system-level resource constraints. These models incorporate M/M/1 and M/G/1 queue dynamics to represent dedicated and shared resource usage, respectively. The resulting optimization problem is non-convex due to the non-linear delay terms. To overcome this, we develop SPARQ, an iterative approximation algorithm that decomposes the problem into two convex sub-problems, enabling joint optimization of service placement, routing, and resource allocation under nonlinear delay constraints. Simulation results demonstrate that the SPARQ not only offers a more faithful representation of system delays, but also substantially improves resource efficiency and the overall cost-delay tradeoff compared to existing state-of-the-art methods.
Problem

Research questions and friction points this paper is trying to address.

Optimizing AI application distribution under nonlinear delay constraints
Capturing nonlinear delay-resource relationships in AI workloads
Enabling queuing-aware orchestration across edge-cloud infrastructures
Innovation

Methods, ideas, or system contributions that make the work stand out.

Extends cloud network flow with queuing-delay-aware orchestration
Introduces Guaranteed-Resource and Shared-Resource execution models
Develops iterative approximation algorithm for non-convex optimization
🔎 Similar Papers
No similar papers found.
P
Pietro Spadaccino
Department of Information Engineering, Electronics, and Telecommunications of Sapienza University, 00184 Roma, Italy
Paolo Di Lorenzo
Paolo Di Lorenzo
Sapienza University of Rome
Signal ProcessingMachine LearningWireless CommunicationsNetwork Theory
Sergio Barbarossa
Sergio Barbarossa
Sapienza University of Rome
signal processinggraph signal processingmobile edge computing5G6G
A
Antonia M. Tulino
Department of Electrical Engineering, Università degli Studi di Napoli Federico II, 80138 Napoli, Italy, and with the Electrical and Computer Engineering Department, New York University, USA
Jaime Llorca
Jaime Llorca
University of Trento and New York University
Next Generation NetworksMobile/Edge/Cloud ComputingNetwork OptimizationResource Allocation