High-Fidelity Network Management for Federated AI-as-a-Service: Cross-Domain Orchestration

📅 2026-02-16

📈 Citations: 0

✨ Influential: 0

career value

255K/year

🤖 AI Summary

This work addresses the lack of end-to-end guarantees against communication and inference impairments—such as latency, packet loss, and errors—in multi-domain Federated AI-as-a-Service (AIaaS) environments. The authors propose a guarantee-oriented AIaaS management plane that introduces composable and verifiable Tail Risk Envelopes (TREs), integrating stochastic network calculus with tail risk modeling to enable intent-driven joint orchestration of networking and computing resources and a decomposable end-to-end budget for delay violation probabilities. Coupled with a runtime telemetry-based auditing mechanism, the framework supports cross-domain accountability attribution. Experimental results demonstrate that the approach significantly improves p99.9 latency compliance under overload and bursty traffic conditions while ensuring strong tenant isolation and precise tail-risk accountability.

Technology Category

Application Category

📝 Abstract

To support the emergence of AI-as-a-Service (AIaaS), communication service providers (CSPs) are on the verge of a radical transformation-from pure connectivity providers to AIaaS a managed network service (control-and-orchestration plane that exposes AI models). In this model, the CSP is responsible not only for transport/communications, but also for intent-to-model resolution and joint network-compute orchestration, i.e., reliable and timely end-to-end delivery. The resulting end-to-end AIaaS service thus becomes governed by communications impairments (delay, loss) and inference impairments (latency, error). A central open problem is an operational AIaaS control-and-orchestration framework that enforces high fidelity, particularly under multi-domain federation. This paper introduces an assurance-oriented AIaaS management plane based on Tail-Risk Envelopes (TREs): signed, composable per-domain descriptors that combine deterministic guardrails with stochastic rate-latency-impairment models. Using stochastic network calculus, we derive bounds on end-to-end delay violation probabilities across tandem domains and obtain an optimization-ready risk-budget decomposition. We show that tenant-level reservations prevent bursty traffic from inflating tail latency under TRE contracts. An auditing layer then uses runtime telemetry to estimate extreme-percentile performance, quantify uncertainty, and attribute tail-risk to each domain for accountability. Packet-level Monte-Carlo simulations demonstrate improved p99.9 compliance under overload via admission control and robust tenant isolation under correlated burstiness.

Problem

Research questions and friction points this paper is trying to address.

AI-as-a-Service

multi-domain federation

high-fidelity

network orchestration

tail-risk

Innovation

Methods, ideas, or system contributions that make the work stand out.

Tail-Risk Envelopes

Federated AI-as-a-Service

Stochastic Network Calculus