Learned Controllers for Agile Quadrotors in Pursuit-Evasion Games

📅 2025-06-03

📈 Citations: 0

✨ Influential: 0

career value

226K/year

🤖 AI Summary

This work addresses the 1v1 high-speed pursuit–evasion control problem for quadrotor drones intruding into no-fly zones. We propose an end-to-end reinforcement learning framework that directly outputs body-frame angular rates and total thrust commands, fully exploiting the nonlinear dynamical limits of the platform. To overcome non-stationarity and catastrophic forgetting in adversarial training, we design an Asynchronous Multi-Stage Population-Based (AMSPB) training algorithm, enabling continuous co-evolution of pursuer and evader policies while ensuring monotonic policy improvement and retention of historical capabilities. Experiments demonstrate that our action-level policy achieves a 42% higher capture rate and 3.1× greater peak velocity compared to velocity-level baseline controllers. Moreover, AMSPB yields stable, monotonic win-rate improvements against diverse benchmark opponents, significantly enhancing robustness and generalization in pursuit–evasion tasks.

Technology Category

Application Category

📝 Abstract

The increasing proliferation of small UAVs in civilian and military airspace has raised critical safety and security concerns, especially when unauthorized or malicious drones enter restricted zones. In this work, we present a reinforcement learning (RL) framework for agile 1v1 quadrotor pursuit-evasion. We train neural network policies to command body rates and collective thrust, enabling high-speed pursuit and evasive maneuvers that fully exploit the quadrotor's nonlinear dynamics. To mitigate nonstationarity and catastrophic forgetting during adversarial co-training, we introduce an Asynchronous Multi-Stage Population-Based (AMSPB) algorithm where, at each stage, either the pursuer or evader learns against a sampled opponent drawn from a growing population of past and current policies. This continual learning setup ensures monotonic performance improvement and retention of earlier strategies. Our results show that (i) rate-based policies achieve significantly higher capture rates and peak speeds than velocity-level baselines, and (ii) AMSPB yields stable, monotonic gains against a suite of benchmark opponents.

Problem

Research questions and friction points this paper is trying to address.

Develop RL framework for agile quadrotor pursuit-evasion games

Address nonstationarity in adversarial co-training via AMSPB algorithm

Enhance capture rates and speeds with rate-based neural policies

Innovation

Methods, ideas, or system contributions that make the work stand out.

Reinforcement learning for agile quadrotor control

Asynchronous Multi-Stage Population-Based training algorithm

Neural network policies for high-speed maneuvers

🔎 Similar Papers

No similar papers found.