🤖 AI Summary
This work addresses the challenge of cooperative data collection in multi-UAV systems under communication constraints and asynchronous action execution. The authors propose a joint trajectory and bandwidth optimization method, formulating the problem as a decentralized partially observable semi-Markov decision process (Dec-POMDP). They introduce the first asynchronous multi-agent reinforcement learning framework tailored for non-synchronous environments, enabling each UAV to collaboratively optimize its flight trajectory and bandwidth allocation based on local observations. Experimental results demonstrate that the proposed approach significantly outperforms existing learning-based and heuristic algorithms in terms of energy efficiency and task completion time, while exhibiting strong robustness across diverse environmental conditions. This method effectively resolves the coordination challenges inherent in asynchronous, communication-constrained multi-UAV scenarios.
📝 Abstract
This paper addresses the joint optimization of trajectories and bandwidth allocation for multiple Unmanned Aerial Vehicles (UAVs) to enhance energy efficiency in the cooperative data collection problem. We focus on an important yet underestimated aspect of the system, where action synchronization across all UAVs is impossible. Since most existing learning-based solutions are not designed to learn in this asynchronous environment, we formulate the trajectory planning problem as a Decentralized Partially Observable Semi-Markov Decision Process and introduce an asynchronous multi-agent learning algorithm to learn UAVs’ cooperative policies. Once the UAVs’ trajectory policies are learned, the bandwidth allocation can be optimally solved based on local observations at each collection point. Comprehensive empirical results demonstrate the superiority of the proposed method over other learning-based and heuristic baselines in terms of both energy efficiency and mission completion time. Additionally, the learned policies exhibit robustness under varying environmental conditions.