🤖 AI Summary
This work addresses the challenge of emergent agile maneuvering and strategic behavior in drone racing, which are difficult to elicit via hand-crafted reward shaping. We propose a multi-agent competitive reinforcement learning framework guided by sparse, high-level win-oriented objectives. Methodologically, we train end-to-end racing policies using PPO in domain-randomized simulation, directly outputting low-level control commands without explicit kinematic rewards. To our knowledge, this is the first demonstration of such a paradigm on real quadrotor platforms. Results show: (i) a 47% improvement in real-world race win rate on complex tracks with dynamic obstacles; (ii) a 3.2× increase in sim-to-real transfer success over single-agent baselines; and (iii) strong generalization across diverse opponents. Our core contribution is the empirical validation that sparse, task-level rewards in multi-agent competition naturally induce extreme flight capabilities and high-level racing strategies, while significantly enhancing robustness in physical deployment.
📝 Abstract
Through multi-agent competition and the sparse high-level objective of winning a race, we find that both agile flight (e.g., high-speed motion pushing the platform to its physical limits) and strategy (e.g., overtaking or blocking) emerge from agents trained with reinforcement learning. We provide evidence in both simulation and the real world that this approach outperforms the common paradigm of training agents in isolation with rewards that prescribe behavior, e.g., progress on the raceline, in particular when the complexity of the environment increases, e.g., in the presence of obstacles. Moreover, we find that multi-agent competition yields policies that transfer more reliably to the real world than policies trained with a single-agent progress-based reward, despite the two methods using the same simulation environment, randomization strategy, and hardware. In addition to improved sim-to-real transfer, the multi-agent policies also exhibit some degree of generalization to opponents unseen at training time. Overall, our work, following in the tradition of multi-agent competitive game-play in digital domains, shows that sparse task-level rewards are sufficient for training agents capable of advanced low-level control in the physical world.
Code: https://github.com/Jirl-upenn/AgileFlight_MultiAgent