🤖 AI Summary
Conventional deterministic approaches (e.g., Voronoi partitioning, extremum seeking) suffer from low coverage efficiency and poor adaptability for small-scale high-altitude balloon (HAB) formations performing localized stratospheric missions (e.g., reconnaissance, environmental monitoring).
Method: This work pioneers the systematic application of multi-agent reinforcement learning (MARL) to HAB cooperative area coverage. We propose a QMIX-based centralized-training-with-decentralized-execution framework, designing an observation space that integrates individual states, atmospheric wind fields, and teammate positions, and introducing a Voronoi-inspired hierarchical reward function to jointly optimize coverage quality and spatial uniformity.
Contribution/Results: Evaluated in the RLHAB simulation environment, our approach achieves coverage performance approaching that of theoretical-optimal geometric methods, while significantly enhancing autonomy and robustness for small HAB teams. The results empirically validate both the effectiveness and scalability of MARL for coordinated HAB control.
📝 Abstract
High Altitude Balloons (HABs) can leverage stratospheric wind layers for limited horizontal control, enabling applications in reconnaissance, environmental monitoring, and communications networks. Existing multi-agent HAB coordination approaches use deterministic methods like Voronoi partitioning and extremum seeking control for large global constellations, which perform poorly for smaller teams and localized missions. While single-agent HAB control using reinforcement learning has been demonstrated on HABs, coordinated multi-agent reinforcement learning (MARL) has not yet been investigated. This work presents the first systematic application of multi-agent reinforcement learning (MARL) to HAB coordination for distributed area coverage. We extend our previously developed reinforcement learning simulation environment (RLHAB) to support cooperative multi-agent learning, enabling multiple agents to operate simultaneously in realistic atmospheric conditions. We adapt QMIX for HAB area coverage coordination, leveraging Centralized Training with Decentralized Execution to address atmospheric vehicle coordination challenges. Our approach employs specialized observation spaces providing individual state, environmental context, and teammate data, with hierarchical rewards prioritizing coverage while encouraging spatial distribution. We demonstrate that QMIX achieves similar performance to the theoretically optimal geometric deterministic method for distributed area coverage, validating the MARL approach and providing a foundation for more complex autonomous multi-HAB missions where deterministic methods become intractable.