🤖 AI Summary
This work addresses the challenges of multi-agent reinforcement learning (MARL) in real-time strategy (RTS) games. We introduce an efficient, open-source Generals.io-based environment compatible with Gymnasium and PettingZoo, enabling simulation at >1,000 frames per second. Methodologically, we propose a unified framework integrating supervised pretraining, self-play RL, and potential-based reward shaping, augmented with recurrent neural networks to model long-term dependencies under partial observability. Trained for only 36 hours on a single H100 GPU, our agent achieves top-0.003% performance on the 1v1 human leaderboard—surpassing >99.997% of human players. To our knowledge, this is the first MARL system to rapidly attain elite human-level performance in a lightweight RTS environment. Our contribution includes a modular, scalable MARL benchmark platform and a state-of-the-art algorithmic baseline, advancing research in real-time, partially observable, and highly dynamic multi-agent learning.
📝 Abstract
We introduce a real-time strategy game environment built on Generals.io, a game that hosts thousands of active players each week across multiple game formats. Our environment is fully compatible with Gymnasium and PettingZoo, capable of running thousands of frames per second on commodity hardware. Our reference agent -- trained with supervised pre-training and self-play -- hits the top 0.003% of the 1v1 human leaderboard after just 36 hours on a single H100 GPU. To accelerate learning, we incorporate potential-based reward shaping and memory features. Our contributions -- a modular RTS benchmark and a competitive, state-of-the-art baseline agent -- provide an accessible yet challenging platform for advancing multi-agent reinforcement learning research.