Hybrid Orchestration of Edge AI and Microservices via Graph-based Self-Imitation Learning

📅 2026-03-03

📈 Citations: 0

✨ Influential: 0

career value

226K/year

🤖 AI Summary

This work addresses the challenge of orchestrating heterogeneous request chains composed of AI services and traditional microservices in resource-constrained edge environments, where tightly coupled deployment and routing decisions render existing isolated optimization approaches ineffective for ensuring system performance. To this end, the paper introduces self-imitation learning into edge AI microservice orchestration for the first time, formulating hybrid orchestration as a sequential decision-making problem. It leverages a graph attention network to encode service topology and dependency relationships and integrates a self-imitation-enhanced proximal policy optimization (PPO) algorithm to jointly optimize deployment and routing strategies. The proposed method effectively explores high-reward trajectories in sparse-reward settings with large combinatorial action spaces, significantly reducing end-to-end latency and improving resource utilization, outperforming various heuristic, metaheuristic, and deep reinforcement learning baselines.

Technology Category

Application Category

📝 Abstract

Modern edge AI applications increasingly rely on microservice architectures that integrate both AI services and conventional microservices into complex request chains with stringent latency requirements. Effectively orchestrating these heterogeneous services is crucial for ensuring low-latency performance, yet remains challenging due to their diverse resource demands and strong operational interdependencies under resource-constrained edge environments. In particular, frequent interactions between services tightly couple deployment and routing decisions, yet existing approaches optimize them in isolation, leading to fundamentally inadequate system performance.In this paper, we propose SIL-GPO, a reinforcement learning framework that optimizes hybrid orchestration for edge AI microservice systems. SIL-GPO formulates the orchestration problem as a sequential decision-making task and leverages graph attention networks to encode service topologies and routing dependencies within the agent state representation. Moreover, SIL-GPO integrates a self-imitation learning strategy into proximal policy optimization, enabling the agent to prioritize and reuse high-reward trajectories. This guides policy updates towards globally promising solutions that standard RL often fails to discover under sparse rewards and large combinatorial action spaces. We conduct extensive experiments on trace-driven edge AI workloads, demonstrating that SIL-GPO significantly reduces end-to-end service latency and enhances resource utilization compared to state-of-the-art heuristic, metaheuristic, and deep RL baselines. Our framework offers a unified and scalable solution for efficient orchestration of AI services and microservices in the edge, paving the way for low-latency, high-performance edge AI deployments.

Problem

Research questions and friction points this paper is trying to address.

Edge AI

Microservices

Service Orchestration

Latency Optimization

Resource Constraints

Innovation

Methods, ideas, or system contributions that make the work stand out.

Graph-based Self-Imitation Learning

Hybrid Orchestration

Edge AI Microservices