🤖 AI Summary
This work addresses the 1v1 high-speed pursuit–evasion control problem for quadrotor drones intruding into no-fly zones. We propose an end-to-end reinforcement learning framework that directly outputs body-frame angular rates and total thrust commands, fully exploiting the nonlinear dynamical limits of the platform. To overcome non-stationarity and catastrophic forgetting in adversarial training, we design an Asynchronous Multi-Stage Population-Based (AMSPB) training algorithm, enabling continuous co-evolution of pursuer and evader policies while ensuring monotonic policy improvement and retention of historical capabilities. Experiments demonstrate that our action-level policy achieves a 42% higher capture rate and 3.1× greater peak velocity compared to velocity-level baseline controllers. Moreover, AMSPB yields stable, monotonic win-rate improvements against diverse benchmark opponents, significantly enhancing robustness and generalization in pursuit–evasion tasks.
📝 Abstract
The increasing proliferation of small UAVs in civilian and military airspace has raised critical safety and security concerns, especially when unauthorized or malicious drones enter restricted zones. In this work, we present a reinforcement learning (RL) framework for agile 1v1 quadrotor pursuit-evasion. We train neural network policies to command body rates and collective thrust, enabling high-speed pursuit and evasive maneuvers that fully exploit the quadrotor's nonlinear dynamics. To mitigate nonstationarity and catastrophic forgetting during adversarial co-training, we introduce an Asynchronous Multi-Stage Population-Based (AMSPB) algorithm where, at each stage, either the pursuer or evader learns against a sampled opponent drawn from a growing population of past and current policies. This continual learning setup ensures monotonic performance improvement and retention of earlier strategies. Our results show that (i) rate-based policies achieve significantly higher capture rates and peak speeds than velocity-level baselines, and (ii) AMSPB yields stable, monotonic gains against a suite of benchmark opponents.