Intra-DP: A High Performance Collaborative Inference System for Mobile Edge Computing

📅 2025-07-08

📈 Citations: 0

✨ Influential: 0

career value

258K/year

🤖 AI Summary

In mobile edge computing, DNN inference faces severe transmission bottlenecks and resource constraints due to conventional layer-wise partitioning and sequential execution. To address this, we propose an operator-level collaborative inference system. Our method breaks inter-layer dependencies by decomposing models into local operators and enabling fine-grained parallel scheduling, thereby deeply overlapping subtask computation with cross-device communication. Crucially, it co-designs the inference strategy with intrinsic model structural characteristics to optimize end-edge collaborative execution. Experimental evaluation demonstrates that, compared to state-of-the-art approaches, our system reduces single-inference latency by up to 50% and energy consumption by up to 75%, while strictly preserving the original model’s accuracy.

Technology Category

Application Category

📝 Abstract

Deploying deep neural networks (DNNs) on resource-constrained mobile devices presents significant challenges, particularly in achieving real-time performance while simultaneously coping with limited computational resources and battery life. While Mobile Edge Computing (MEC) offers collaborative inference with GPU servers as a promising solution, existing approaches primarily rely on layer-wise model partitioning and undergo significant transmission bottlenecks caused by the sequential execution of DNN operations. To address this challenge, we present Intra-DP, a high-performance collaborative inference system optimized for DNN inference on MEC. Intra DP employs a novel parallel computing technique based on local operators (i.e., operators whose minimum unit input is not the entire input tensor, such as the convolution kernel). By decomposing their computations (operations) into several independent sub-operations and overlapping the computation and transmission of different sub-operations through parallel execution, Intra-DP mitigates transmission bottlenecks in MEC, achieving fast and energy-efficient inference. The evaluation demonstrates that Intra-DP reduces per-inference latency by up to 50% and energy consumption by up to 75% compared to state-of-the-art baselines, without sacrificing accuracy.

Problem

Research questions and friction points this paper is trying to address.

Achieving real-time DNN inference on resource-constrained mobile devices

Overcoming transmission bottlenecks in Mobile Edge Computing (MEC)

Reducing latency and energy consumption without sacrificing accuracy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Parallel computing with local operators

Decomposing computations into independent sub-operations

Overlapping computation and transmission processes

🔎 Similar Papers

No similar papers found.