🤖 AI Summary
Unmanned aerial vehicles (UAVs) operating in dynamic airspace suffer from policy fragility, inaccurate value estimation, and unsafe decision-making under out-of-distribution (OOD) adversarial attacks—e.g., GPS spoofing—that perturb observational inputs.
Method: This paper proposes a curriculum-guided robust reinforcement learning framework. We first formally define the policy fragility boundary and establish a theoretical foundation for robustness based on boundedness of the value function distribution. An expert-guided critic alignment mechanism is introduced, minimizing Wasserstein distance to mitigate catastrophic forgetting. Furthermore, we integrate progressive adversarial perturbation–based curriculum learning with projected gradient descent.
Results: Evaluated on a 3D dynamic obstacle-avoidance task, our approach achieves a 15% increase in cumulative reward and reduces collision incidents by over 30% compared to baseline methods, demonstrating significantly improved generalization and operational safety.
📝 Abstract
Reinforcement learning (RL) policies deployed in safety-critical systems, such as unmanned aerial vehicle (UAV) navigation in dynamic airspace, are vulnerable to out-ofdistribution (OOD) adversarial attacks in the observation space. These attacks induce distributional shifts that significantly degrade value estimation, leading to unsafe or suboptimal decision making rendering the existing policy fragile. To address this vulnerability, we propose an antifragile RL framework designed to adapt against curriculum of incremental adversarial perturbations. The framework introduces a simulated attacker which incrementally increases the strength of observation-space perturbations which enables the RL agent to adapt and generalize across a wider range of OOD observations and anticipate previously unseen attacks. We begin with a theoretical characterization of fragility, formally defining catastrophic forgetting as a monotonic divergence in value function distributions with increasing perturbation strength. Building on this, we define antifragility as the boundedness of such value shifts and derive adaptation conditions under which forgetting is stabilized. Our method enforces these bounds through iterative expert-guided critic alignment using Wasserstein distance minimization across incrementally perturbed observations. We empirically evaluate the approach in a UAV deconfliction scenario involving dynamic 3D obstacles. Results show that the antifragile policy consistently outperforms standard and robust RL baselines when subjected to both projected gradient descent (PGD) and GPS spoofing attacks, achieving up to 15% higher cumulative reward and over 30% fewer conflict events. These findings demonstrate the practical and theoretical viability of antifragile reinforcement learning for secure and resilient decision-making in environments with evolving threat scenarios.