🤖 AI Summary
To address the challenge of guaranteeing inference quality of experience (QoE)—particularly latency—in edge AI, where heterogeneous devices suffer from resource constraints and dynamic network conditions, this paper proposes the first QoE-aware hybrid-parallel collaborative optimization framework. The framework jointly optimizes computational distribution, communication scheduling, and runtime adaptation through three key innovations: heterogeneity-aware model partitioning, contention-aware network scheduling, and multi-plan dynamic composition and adaptation. By formulating a multi-objective QoE model and employing lightweight online decision-making, it achieves low-latency, energy-efficient distributed inference. Evaluated on real-world edge scenarios—including smart home and traffic analytics—the framework improves inference throughput by 1.1–6.3× and reduces energy consumption by 21%–82%, while strictly satisfying end-to-end QoE constraints.
📝 Abstract
With the proliferation of edge AI applications, satisfying user quality of experience (QoE) requirements, such as model inference latency, has become a first class objective, as these models operate in resource constrained settings and directly interact with users. Yet, modern AI models routinely exceed the resource capacity of individual devices, necessitating distributed execution across heterogeneous devices over variable and contention prone networks. Existing planners for hybrid (e.g., data and pipeline) parallelism largely optimize for throughput or device utilization, overlooking QoE, leading to severe resource inefficiency (e.g., unnecessary energy drain) or QoE violations under runtime dynamics.
We present Dora, a framework for QoE aware hybrid parallelism in distributed edge AI training and inference. Dora jointly optimizes heterogeneous computation, contention prone networks, and multi dimensional QoE objectives via three key mechanisms: (i) a heterogeneity aware model partitioner that determines and assigns model partitions across devices, forming a compact set of QoE compliant plans; (ii) a contention aware network scheduler that further refines these candidate plans by maximizing compute communication overlap; and (iii) a runtime adapter that adaptively composes multiple plans to maximize global efficiency while respecting overall QoEs. Across representative edge deployments, including smart homes, traffic analytics, and small edge clusters, Dora achieves 1.1--6.3 times faster execution and, alternatively, reduces energy consumption by 21--82 percent, all while maintaining QoE under runtime dynamics.