π€ AI Summary
In multi-UAV-assisted mobile edge computing (MEC) systems, jointly optimizing task offloading volume, latency, and energy consumption remains challenging. To address this, we propose TCRAMOPβa multi-objective framework for joint optimization of UAV trajectories and computational resource allocation. We further design DPPOIL, a distributed proximal policy optimization algorithm enhanced with generative adversarial imitation learning, to improve policy convergence and generalization under dynamic environmental conditions. Experimental results demonstrate that DPPOIL significantly outperforms baseline methods: it increases task offloading volume by 23.6%, reduces average offloading latency by 31.4%, and decreases total UAV energy consumption by 27.8%. The framework effectively balances system efficiency and resource overhead, offering a scalable, distributed intelligent decision-making solution for low-latency, high-energy-efficiency space-air-ground integrated edge computing.
π Abstract
Mobile edge computing (MEC) is a promising technique to improve the computational capacity of smart devices (SDs) in Internet of Things (IoT). However, the performance of MEC is restricted due to its fixed location and limited service scope. Hence, we investigate an unmanned aerial vehicle (UAV)-assisted MEC system, where multiple UAVs are dispatched and each UAV can simultaneously provide computing service for multiple SDs. To improve the performance of system, we formulated a UAV-based trajectory control and resource allocation multi-objective optimization problem (TCRAMOP) to simultaneously maximize the offloading number of UAVs and minimize total offloading delay and total energy consumption of UAVs by optimizing the flight paths of UAVs as well as the computing resource allocated to served SDs. Then, consider that the solution of TCRAMOP requires continuous decision-making and the system is dynamic, we propose an enhanced deep reinforcement learning (DRL) algorithm, namely, distributed proximal policy optimization with imitation learning (DPPOIL). This algorithm incorporates the generative adversarial imitation learning technique to improve the policy performance. Simulation results demonstrate the effectiveness of our proposed DPPOIL and prove that the learned strategy of DPPOIL is better compared with other baseline methods.