Multi-Agent Reinforcement Learning Counteracts Delayed CSI in Multi-Satellite Systems

📅 2026-03-17

📈 Citations: 0

✨ Influential: 0

career value

244K/year

🤖 AI Summary

This work addresses the performance degradation in multi-satellite cooperative downlink systems caused by outdated channel state information (CSI) due to high propagation delays in satellite communications. To tackle this challenge, the authors propose a dual-stage proximal policy optimization (DS-PPO) multi-agent reinforcement learning algorithm, which introduces a novel two-level optimization framework to jointly optimize individual satellite transmissions and inter-satellite coordination under imperfect CSI conditions. The proposed method effectively handles the challenges posed by large continuous action spaces and non-independent and identically distributed (non-IID) environments. Experimental results demonstrate that DS-PPO significantly improves the aggregate user data rate while maintaining strong convergence properties and manageable computational complexity.

Technology Category

Application Category

📝 Abstract

The integration of satellite communication networks with next-generation (NG) technologies is a promising approach towards global connectivity. However, the quality of services is highly dependant on the availability of accurate channel state information (CSI). Channel estimation in satellite communications is challenging due to the high propagation delay between terrestrial users and satellites, which results in outdated CSI observations on the satellite side. In this paper, we study the downlink transmission of multiple satellites acting as distributed base stations (BS) to mobile terrestrial users. We propose a multi-agent reinforcement learning (MARL) algorithm which aims for maximising the sum-rate of the users, while coping with the outdated CSI. We design a novel bi-level optimisation, procedure themes as dual stage proximal policy optimisation (DS-PPO), for tackling the problem of large continuous action spaces as well as of independent and non-identically distributed (non-IID) environments in MARL. Specifically, the first stage of DS-PPO maximises the sum-rate for an individual satellite and the second stage maximises the sum-rate when all the satellites cooperate to form a distributed multi-antenna BS. Our numerical results demonstrate the robustness of DS-PPO to CSI imperfections as well as the sum-rate improvement attached by the use of DS-PPO. In addition, we provide the convergence analysis for the DS-PPO along with the computational complexity.

Problem

Research questions and friction points this paper is trying to address.

delayed CSI

multi-satellite systems

channel state information

propagation delay

downlink transmission

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-Agent Reinforcement Learning

Delayed CSI

DS-PPO