Mastering Multi-Drone Volleyball through Hierarchical Co-Self-Play Reinforcement Learning

📅 2025-05-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the embodied competitive task of 3v3 multi-UAV volleyball, which poses challenges including long-horizon dependencies, strong inter-agent coupling, and underactuated quadrotor dynamics. We propose the Hierarchical Cooperative Self-Play (HCSP) framework, introducing a novel three-stage population training pipeline that jointly optimizes high-level strategic behaviors—such as emergent role switching and coordinated formations—and low-level agile control, all learned from scratch. HCSP adopts a “centralized high-level planning + decentralized low-level execution” architecture, integrating hierarchical reinforcement learning with multi-agent self-play. In simulation, HCSP achieves an average win rate of 82.9%, significantly outperforming non-hierarchical self-play (71.5%), rule-based baselines, and a two-stage ablation variant. Furthermore, the policy is successfully deployed on real quadrotor platforms. This work establishes a scalable paradigm for competitive, embodied multi-agent intelligence.

Technology Category

Application Category

📝 Abstract
In this paper, we tackle the problem of learning to play 3v3 multi-drone volleyball, a new embodied competitive task that requires both high-level strategic coordination and low-level agile control. The task is turn-based, multi-agent, and physically grounded, posing significant challenges due to its long-horizon dependencies, tight inter-agent coupling, and the underactuated dynamics of quadrotors. To address this, we propose Hierarchical Co-Self-Play (HCSP), a hierarchical reinforcement learning framework that separates centralized high-level strategic decision-making from decentralized low-level motion control. We design a three-stage population-based training pipeline to enable both strategy and skill to emerge from scratch without expert demonstrations: (I) training diverse low-level skills, (II) learning high-level strategy via self-play with fixed low-level controllers, and (III) joint fine-tuning through co-self-play. Experiments show that HCSP achieves superior performance, outperforming non-hierarchical self-play and rule-based hierarchical baselines with an average 82.9% win rate and a 71.5% win rate against the two-stage variant. Moreover, co-self-play leads to emergent team behaviors such as role switching and coordinated formations, demonstrating the effectiveness of our hierarchical design and training scheme.
Problem

Research questions and friction points this paper is trying to address.

Learning 3v3 multi-drone volleyball with strategic and agile control
Addressing long-horizon dependencies and tight inter-agent coupling
Developing hierarchical reinforcement learning for centralized-decentralized control
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical reinforcement learning for multi-drone control
Three-stage population-based training pipeline
Co-self-play for emergent team behaviors
🔎 Similar Papers
No similar papers found.