🤖 AI Summary
Autonomous collaborative Earth observation by low-Earth-orbit (LEO) satellite constellations faces challenges arising from dynamic orbital environments, severe resource constraints (e.g., energy and onboard storage), and partial observability. Method: We establish a near-realistic multi-satellite dynamics and mission simulation environment, and conduct the first systematic evaluation of MARL algorithms—including PPO, IPPO, MAPPO, and HAPPO—under decentralized, partially observable Markov games. To address non-stationarity and reward coupling inherent in Earth observation tasks, we propose a training stabilization strategy tailored to these characteristics. Contribution/Results: Our approach significantly improves multi-satellite collaborative imaging efficiency and onboard resource utilization while maintaining high task performance and enhancing system robustness. It delivers a deployable, real-time, on-orbit autonomous decision-making solution grounded in MARL, validated under realistic operational constraints.
📝 Abstract
The exponential growth of Low Earth Orbit (LEO) satellites has revolutionised Earth Observation (EO) missions, addressing challenges in climate monitoring, disaster management, and more. However, autonomous coordination in multi-satellite systems remains a fundamental challenge. Traditional optimisation approaches struggle to handle the real-time decision-making demands of dynamic EO missions, necessitating the use of Reinforcement Learning (RL) and Multi-Agent Reinforcement Learning (MARL). In this paper, we investigate RL-based autonomous EO mission planning by modelling single-satellite operations and extending to multi-satellite constellations using MARL frameworks. We address key challenges, including energy and data storage limitations, uncertainties in satellite observations, and the complexities of decentralised coordination under partial observability. By leveraging a near-realistic satellite simulation environment, we evaluate the training stability and performance of state-of-the-art MARL algorithms, including PPO, IPPO, MAPPO, and HAPPO. Our results demonstrate that MARL can effectively balance imaging and resource management while addressing non-stationarity and reward interdependency in multi-satellite coordination. The insights gained from this study provide a foundation for autonomous satellite operations, offering practical guidelines for improving policy learning in decentralised EO missions.