🤖 AI Summary
This paper addresses the dynamic optimization of service modes—full offloading versus partial local processing—in computation offloading systems, aiming to minimize end-to-end system latency. We propose a threshold-based dynamic decision policy that jointly leverages cloud server occupancy status and local queue length: full offloading is prioritized when the cloud is idle; when the cloud is busy, local processing is triggered only if the local queue length exceeds a theoretically derived optimal threshold. Through rigorous modeling as a Markov decision process and structural policy analysis, we establish, for the first time, the global optimality of this threshold policy under general arrival and service distributions. Simulation results demonstrate that the proposed method reduces average latency by up to 23.6% compared to baseline strategies, while maintaining low computational complexity and strong robustness. Our work thus provides a provably optimal, practical solution for edge–cloud collaborative offloading.
📝 Abstract
We consider a simple computation offloading model where jobs can either be fully processed in the cloud or be partially processed at a local server before being sent to the cloud to complete processing. Our goal is to design a policy for assigning jobs to service modes, i.e., full offloading or partial offloading, based on the state of the system, in order to minimize delay in the system. We show that when the cloud server is idle, the optimal policy is to assign the next job in the system queue to the cloud for processing. However, when the cloud server is busy, we show that, under mild assumptions, the optimal policy is of a threshold type, that sends the next job in the system queue to the local server if the queue exceeds a certain threshold. Finally, we demonstrate this policy structure through simulations.