🤖 AI Summary
To address the discontinuous heterogeneous network coverage and the coupled optimization challenge of links and trajectories induced by 3D UAV mobility in Space-Air-Ground Integrated Networks (SAGIN), this paper proposes a novel two-tier multi-agent hierarchical reinforcement learning framework. At the top tier, a robust discrete link selection policy is realized via Double Deep Q-Networks (DDQN); at the bottom tier, a Lagrangian-constrained Soft Actor-Critic (SAC) algorithm jointly optimizes continuous UAV trajectories while strictly enforcing multi-dimensional QoS hard constraints. The framework supports centralized training with decentralized execution (CTDE) for multi-UAV coordination. This work is the first to embed discrete–continuous coupled decision-making into a hierarchical MARL architecture. Experimental results demonstrate that, while guaranteeing end-to-end QoS satisfaction (≥98.7%), the method significantly improves throughput (+32%) and reduces link handover frequency (−41%), outperforming all existing baselines across key metrics.
📝 Abstract
Due to the significant variations in unmanned aerial vehicle (UAV) altitude and horizontal mobility, it becomes difficult for any single network to ensure continuous and reliable threedimensional coverage. Towards that end, the space-air-ground integrated network (SAGIN) has emerged as an essential architecture for enabling ubiquitous UAV connectivity. To address the pronounced disparities in coverage and signal characteristics across heterogeneous networks, this paper formulates UAV mobility management in SAGIN as a constrained multi-objective joint optimization problem. The formulation couples discrete link selection with continuous trajectory optimization. Building on this, we propose a two-level multi-agent hierarchical deep reinforcement learning (HDRL) framework that decomposes the problem into two alternately solvable subproblems. To map complex link selection decisions into a compact discrete action space, we conceive a double deep Q-network (DDQN) algorithm in the top-level, which achieves stable and high-quality policy learning through double Q-value estimation. To handle the continuous trajectory action space while satisfying quality of service (QoS) constraints, we integrate the maximum-entropy mechanism of the soft actor-critic (SAC) and employ a Lagrangian-based constrained SAC (CSAC) algorithm in the lower-level that dynamically adjusts the Lagrange multipliers to balance constraint satisfaction and policy optimization. Moreover, the proposed algorithm can be extended to multi-UAV scenarios under the centralized training and decentralized execution (CTDE) paradigm, which enables more generalizable policies. Simulation results demonstrate that the proposed scheme substantially outperforms existing benchmarks in throughput, link switching frequency and QoS satisfaction.