🤖 AI Summary
This work addresses the challenge of balancing data collection and energy consumption in UAV-assisted IoT networks under dynamic urban environments, where existing approaches suffer from poor generalization and heavy reliance on large training datasets. To this end, the authors propose a novel multi-objective reinforcement learning architecture integrated with an attention mechanism for UAV trajectory planning. This framework adaptively trades off data gathering and energy usage under unknown wireless channel conditions. Notably, it is the first to incorporate attention mechanisms into multi-objective reinforcement learning, enabling a single model to generalize across diverse user preferences and dynamic scenarios without fine-tuning. Experimental results demonstrate that the proposed method outperforms existing reinforcement learning approaches in terms of performance, model compactness, sample efficiency, and generalization to unseen environments.
📝 Abstract
Due to their adaptability and mobility, Unmanned Aerial Vehicles (UAVs) are becoming increasingly essential for wireless network services, particularly for data harvesting tasks. In this context, Artificial Intelligence (AI)-based approaches have gained significant attention for addressing UAV path planning tasks in large and complex environments, bridging the gap with real-world deployments. However, many existing algorithms suffer from limited training data, which hampers their performance in highly dynamic environments. Moreover, they often overlook the inherently multi-objective nature of the task, treating it in an overly simplistic manner. To address these limitations, we propose an attention-based Multi-Objective Reinforcement Learning (MORL) architecture that explicitly handles the trade-off between data collection and energy consumption in urban environments, even without prior knowledge of wireless channel conditions. Our method develops a single model capable of adapting to varying trade-off preferences and dynamic scenario parameters without the need for fine-tuning or retraining. Extensive simulations show that our approach achieves substantial improvements in performance, model compactness, sample efficiency, and most importantly, generalization to previously unseen scenarios, outperforming existing RL solutions.