π€ AI Summary
This paper addresses the problem of UAV-UGV cooperative deployment for post-disaster emergency communications, aiming to minimize the number of UAVs while guaranteeing end-to-end QoS for ground users. Methodologically, it proposes a meta-reinforcement learning-based framework for joint UAV placement and trajectory optimization. It introduces Meta-A3Cβthe first integration of meta-learning with asynchronous advantage actor-critic (A3C)βto enable rapid adaptation to dynamic disaster environments. Additionally, it models UGV mobility constraints via a road-topology graph, ensuring both physical feasibility and computational efficiency. Experiments demonstrate that the proposed approach achieves a 13.1% throughput gain over standard A3C and DDPG, accelerates training and inference by 49%, and strictly satisfies end-to-end QoS constraints. Key contributions include: (1) a realistic UAV-UGV co-deployment model tailored to post-disaster scenarios; (2) the first application of meta-learning combined with asynchronous policy gradients to aerial-ground cooperative communication deployment; and (3) a lightweight, scalable, and deployable intelligent decision-making framework.
π Abstract
Joint deployment of unmanned aerial vehicles (UAVs) and unmanned ground vehicles (UGVs) has been shown to be an effective method to establish communications in areas affected by disasters. However, ensuring good Quality of Services (QoS) while using as few UAVs as possible also requires optimal positioning and trajectory planning for UAVs and UGVs. This paper proposes a joint UAV-UGV-based positioning and trajectory planning framework for UAVs and UGVs deployment that guarantees optimal QoS for ground users. To model the UGVs' mobility, we introduce a road graph, which directs their movement along valid road segments and adheres to the road network constraints. To solve the sum rate optimization problem, we reformulate the problem as a Markov Decision Process (MDP) and propose a novel asynchronous Advantage Actor Critic (A3C) incorporated with meta-learning for rapid adaptation to new environments and dynamic conditions. Numerical results demonstrate that our proposed Meta-A3C approach outperforms A3C and DDPG, delivering 13.1% higher throughput and 49% faster execution while meeting the QoS requirements.