SynergAI: Edge-to-Cloud Synergy for Architecture-Driven High-Performance Orchestration for AI Inference

📅 2025-09-12

📈 Citations: 0

✨ Influential: 0

career value

225K/year

🤖 AI Summary

To address high latency, resource constraints, and energy-efficiency imbalance in AI inference services under cloud-edge collaboration, this paper proposes an architecture-aware lightweight scheduling framework—the first to enable intelligent orchestration of heterogeneous resources based on the performance characteristics of modern inference engines. The framework integrates offline performance modeling with online dynamic decision-making, built natively atop the Kubernetes ecosystem. It incorporates fine-grained performance modeling, adaptive workload allocation, and cross-architecture hardware adaptation. Experimental evaluation demonstrates that, compared to the state-of-the-art, the framework reduces average service-level agreement (SLA) violation rates by 2.4×, while significantly improving inference throughput and resource utilization. Crucially, it simultaneously satisfies stringent low-latency requirements, enhances energy efficiency, and preserves data privacy—enabling scalable, sustainable, and secure AI inference deployment across cloud-edge environments.

Technology Category

Application Category

📝 Abstract

The rapid evolution of Artificial Intelligence (AI) and Machine Learning (ML) has significantly heightened computational demands, particularly for inference-serving workloads. While traditional cloud-based deployments offer scalability, they face challenges such as network congestion, high energy consumption, and privacy concerns. In contrast, edge computing provides low-latency and sustainable alternatives but is constrained by limited computational resources. In this work, we introduce SynergAI, a novel framework designed for performance- and architecture-aware inference serving across heterogeneous edge-to-cloud infrastructures. Built upon a comprehensive performance characterization of modern inference engines, SynergAI integrates a combination of offline and online decision-making policies to deliver intelligent, lightweight, and architecture-aware scheduling. By dynamically allocating workloads across diverse hardware architectures, it effectively minimizes Quality of Service (QoS) violations. We implement SynergAI within a Kubernetes-based ecosystem and evaluate its efficiency. Our results demonstrate that architecture-driven inference serving enables optimized and architecture-aware deployments on emerging hardware platforms, achieving an average reduction of 2.4x in QoS violations compared to a State-of-the-Art (SotA) solution.

Problem

Research questions and friction points this paper is trying to address.

Optimizing AI inference across edge-cloud infrastructure

Minimizing QoS violations through dynamic workload allocation

Addressing computational constraints in heterogeneous hardware environments

Innovation

Methods, ideas, or system contributions that make the work stand out.

Edge-to-cloud synergy for AI inference orchestration

Architecture-aware scheduling across heterogeneous infrastructures

Dynamic workload allocation minimizing QoS violations

🔎 Similar Papers

Synergy: Towards On-Body AI via Tiny AI Accelerator Collaboration on Wearables