Learned Controllers for Agile Quadrotors in Pursuit-Evasion Games

📅 2025-06-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the 1v1 high-speed pursuit–evasion control problem for quadrotor drones intruding into no-fly zones. We propose an end-to-end reinforcement learning framework that directly outputs body-frame angular rates and total thrust commands, fully exploiting the nonlinear dynamical limits of the platform. To overcome non-stationarity and catastrophic forgetting in adversarial training, we design an Asynchronous Multi-Stage Population-Based (AMSPB) training algorithm, enabling continuous co-evolution of pursuer and evader policies while ensuring monotonic policy improvement and retention of historical capabilities. Experiments demonstrate that our action-level policy achieves a 42% higher capture rate and 3.1× greater peak velocity compared to velocity-level baseline controllers. Moreover, AMSPB yields stable, monotonic win-rate improvements against diverse benchmark opponents, significantly enhancing robustness and generalization in pursuit–evasion tasks.

Technology Category

Application Category

📝 Abstract
The increasing proliferation of small UAVs in civilian and military airspace has raised critical safety and security concerns, especially when unauthorized or malicious drones enter restricted zones. In this work, we present a reinforcement learning (RL) framework for agile 1v1 quadrotor pursuit-evasion. We train neural network policies to command body rates and collective thrust, enabling high-speed pursuit and evasive maneuvers that fully exploit the quadrotor's nonlinear dynamics. To mitigate nonstationarity and catastrophic forgetting during adversarial co-training, we introduce an Asynchronous Multi-Stage Population-Based (AMSPB) algorithm where, at each stage, either the pursuer or evader learns against a sampled opponent drawn from a growing population of past and current policies. This continual learning setup ensures monotonic performance improvement and retention of earlier strategies. Our results show that (i) rate-based policies achieve significantly higher capture rates and peak speeds than velocity-level baselines, and (ii) AMSPB yields stable, monotonic gains against a suite of benchmark opponents.
Problem

Research questions and friction points this paper is trying to address.

Develop RL framework for agile quadrotor pursuit-evasion games
Address nonstationarity in adversarial co-training via AMSPB algorithm
Enhance capture rates and speeds with rate-based neural policies
Innovation

Methods, ideas, or system contributions that make the work stand out.

Reinforcement learning for agile quadrotor control
Asynchronous Multi-Stage Population-Based training algorithm
Neural network policies for high-speed maneuvers
🔎 Similar Papers
No similar papers found.
A
Alejandro Sanchez Roncero
Robotics, Perception and Learning Lab., School of Electrical Engineering and Computer Science, Royal Institute of Technology (KTH), SE-100 44 Stockholm, Sweden
Olov Andersson
Olov Andersson
Assistant Professor at KTH Royal Institute of Technology. Previously: ASL@ETH Zurich
Robot LearningAutonomous RobotsMotion PlanningMappingNavigation
P
Petter Ogren
Robotics, Perception and Learning Lab., School of Electrical Engineering and Computer Science, Royal Institute of Technology (KTH), SE-100 44 Stockholm, Sweden