🤖 AI Summary
This work addresses the challenge of insufficient ground base station coverage under non-line-of-sight conditions in urban environments by proposing the Multi-Agent Meta-Advisor (MAMO) framework, which leverages a swarm of unmanned aerial vehicle (UAV) base stations to ensure continuous connectivity for highly mobile users through cooperative and dynamic deployment. The approach formulates multi-UAV trajectory planning as a multi-task decentralized partially observable Markov decision process, employing a centralized training with decentralized execution architecture combined with Dueling Deep Q-Networks. A meta-policy derived from cross-task joint learning guides exploration, while a dynamic advisor coverage mechanism rejects misleading guidance when advisors fail. Experimental results demonstrate that MAMO achieves faster convergence and higher cumulative rewards across diverse real-world urban scenarios and UAV launch configurations, significantly outperforming ε-greedy baselines, single-policy approaches, and advisor-only methods, thereby enhancing both generalization capability and communication performance in UAV-assisted networks.
📝 Abstract
Future vehicular networks require continuous connectivity to serve highly mobile users in urban environments. To mitigate the coverage limitations of fixed terrestrial macro base stations (MBS) under non line-of-sight (NLoS) conditions, fleets of unmanned aerial base stations (UABSs) can be deployed as aerial base stations, dynamically repositioning to track vehicular users and traffic hotspots in coordination with the terrestrial network. This paper addresses cooperative multi-agent trajectory design under different service areas and takeoff configurations, where rapid and safe adaptation across scenarios is essential. We formulate the problem as a multi-task decentralized partially observable Markov decision process and solve it using centralized training and decentralized execution with double dueling deep Q-network (3DQN), enabling online training for real-world deployments. However, efficient exploration remains a bottleneck, with conventional strategies like $ε$-greedy requiring careful tuning. To overcome this, we propose the multi-agent meta-advisor with advisor override (MAMO). This framework guides agent exploration through a meta-policy learned jointly across tasks. It uses a dynamic override mechanism that allows agents to reject misaligned guidance when the advisor fails to generalize to a specific scenario. Simulation results across three realistic urban scenarios and multiple takeoff configurations show that MAMO achieves faster convergence and higher returns than tuned $ε$-greedy baselines, outperforming both an advisor-only ablation and a single generalized policy. Finally, we demonstrate that the learned UABS fleet significantly improves network performance compared to deployments without aerial support.