🤖 AI Summary
This work addresses the high inference latency in wireless edge computing caused by the lack of end-to-end co-optimization between communication and computation. To bridge this gap, the paper introduces the Wireless Neural Processing (WNP) paradigm, which unifies wireless transmission and multi-core accelerator execution into an end-to-end pipeline for the first time. The authors propose the O-WiN framework to enable co-scheduling of multimodal DNN workloads and develop the Pipeline-Aware Co-Scheduling (PACS) algorithm, which supports pipeline parallelism through interleaved communication-computation scheduling to effectively mask transmission latency. Experimental results demonstrate that, under highly heterogeneous multimodal scenarios, PACS substantially outperforms conventional RTFS scheduling, achieving significant reductions in end-to-end inference latency.
📝 Abstract
In edge inference, wireless resource allocation and accelerator-level deep neural network (DNN) scheduling have yet to be co-optimized in an end-to-end manner. The lack of coordination between wireless transmission and accelerator-level DNN execution prevents efficient overlap, leading to higher end-to-end inference latency. To address this issue, this paper investigates multimodal DNN workload orchestration in wireless neural processing (WNP), a paradigm that integrates wireless transmission and multi-core accelerator execution into a unified end-to-end pipeline. First, we develop a unified communication-computation model for multimodal DNN execution and formulate the corresponding optimization problem. Second, we propose O-WiN, a framework that orchestrates DNN workloads in WNP through two tightly coupled stages: simulation-based optimization and runtime execution. Third, we develop two algorithms, RTFS and PACS. RTFS schedules communication and computation sequentially, whereas PACS interleaves them to enable pipeline parallelism by overlapping wireless data transfer with accelerator-level DNN execution. Simulation results demonstrate that PACS significantly outperforms RTFS under high modality heterogeneity by better masking wireless latency through communication-computation overlap, thereby highlighting the effectiveness of communication-computation pipelining in accelerating multimodal DNN execution in WNP.