Dora: QoE-Aware Hybrid Parallelism for Distributed Edge AI

📅 2025-12-08

📈 Citations: 0

✨ Influential: 0

career value

240K/year

🤖 AI Summary

To address the challenge of guaranteeing inference quality of experience (QoE)—particularly latency—in edge AI, where heterogeneous devices suffer from resource constraints and dynamic network conditions, this paper proposes the first QoE-aware hybrid-parallel collaborative optimization framework. The framework jointly optimizes computational distribution, communication scheduling, and runtime adaptation through three key innovations: heterogeneity-aware model partitioning, contention-aware network scheduling, and multi-plan dynamic composition and adaptation. By formulating a multi-objective QoE model and employing lightweight online decision-making, it achieves low-latency, energy-efficient distributed inference. Evaluated on real-world edge scenarios—including smart home and traffic analytics—the framework improves inference throughput by 1.1–6.3× and reduces energy consumption by 21%–82%, while strictly satisfying end-to-end QoE constraints.

Technology Category

Application Category

📝 Abstract

With the proliferation of edge AI applications, satisfying user quality of experience (QoE) requirements, such as model inference latency, has become a first class objective, as these models operate in resource constrained settings and directly interact with users. Yet, modern AI models routinely exceed the resource capacity of individual devices, necessitating distributed execution across heterogeneous devices over variable and contention prone networks. Existing planners for hybrid (e.g., data and pipeline) parallelism largely optimize for throughput or device utilization, overlooking QoE, leading to severe resource inefficiency (e.g., unnecessary energy drain) or QoE violations under runtime dynamics. We present Dora, a framework for QoE aware hybrid parallelism in distributed edge AI training and inference. Dora jointly optimizes heterogeneous computation, contention prone networks, and multi dimensional QoE objectives via three key mechanisms: (i) a heterogeneity aware model partitioner that determines and assigns model partitions across devices, forming a compact set of QoE compliant plans; (ii) a contention aware network scheduler that further refines these candidate plans by maximizing compute communication overlap; and (iii) a runtime adapter that adaptively composes multiple plans to maximize global efficiency while respecting overall QoEs. Across representative edge deployments, including smart homes, traffic analytics, and small edge clusters, Dora achieves 1.1--6.3 times faster execution and, alternatively, reduces energy consumption by 21--82 percent, all while maintaining QoE under runtime dynamics.

Problem

Research questions and friction points this paper is trying to address.

Optimizes distributed edge AI execution under resource constraints and user QoE requirements

Addresses inefficiency in hybrid parallelism planning that overlooks quality of experience

Manages heterogeneous devices and variable networks while maintaining latency and energy goals

Innovation

Methods, ideas, or system contributions that make the work stand out.

Heterogeneity-aware model partitioner for QoE-compliant plans

Contention-aware network scheduler maximizing compute-communication overlap

Runtime adapter composing multiple plans for global efficiency

🔎 Similar Papers

An Overview and Solution for Democratizing AI Workflows at the Network Edge