🤖 AI Summary
To address the challenge of jointly ensuring QoS, energy efficiency, and resource fairness in UAV–satellite collaborative network slicing for 6G non-terrestrial networks, this paper proposes a hierarchical network slicing architecture that jointly optimizes UAV trajectory, transmit power, and spectrum allocation. We formulate a decentralized partially observable Markov decision process (POMDP) model and design a centralized-training-with-decentralized-execution multi-agent reinforcement learning algorithm leveraging multi-head attention mechanisms to enable efficient, adaptive resource coordination under dynamic environmental conditions. Experimental results demonstrate that the proposed method achieves up to a 33% improvement in cumulative reward, a 27.5% gain in energy efficiency, and a 19.8% increase in Jain’s fairness index over baseline approaches—thereby simultaneously enhancing service quality, energy sustainability, and resource allocation fairness.
📝 Abstract
Non terrestrial networks are critical for achieving global 6G coverage, yet efficient resource management in aerial and space environments remains challenging due to limited onboard power and dynamic operational conditions. Network slicing offers a promising solution for spectrum optimization in UAV based systems serving heterogeneous service demands. For that, this paper proposes a hierarchical network slicing framework for UAV satellite integrated networks supporting eMBB, URLLC, and mMTC services. Specifically, we formulate a joint optimization of UAV trajectory, transmission power, and spectrum allocation as a decentralized partially observable Markov decision process that ensures quality of service while minimizing energy consumption and maximizing resource fairness. To address the computational intractability and partial observability, we develop a multi agent deep reinforcement learning solution under the centralized training and decentralized execution paradigm. In the proposed system, UAV agents act as distributed actors coordinated by a shared critic operating with multi head attention mechanism at a low Earth orbit satellite. Experimental results then demonstrate that our approach outperforms existing methods by up to 33% in cumulative reward while achieving superior energy efficiency and fairness.