Curriculum-Guided Antifragile Reinforcement Learning for Secure UAV Deconfliction under Observation-Space Attacks

📅 2025-06-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Unmanned aerial vehicles (UAVs) operating in dynamic airspace suffer from policy fragility, inaccurate value estimation, and unsafe decision-making under out-of-distribution (OOD) adversarial attacks—e.g., GPS spoofing—that perturb observational inputs. Method: This paper proposes a curriculum-guided robust reinforcement learning framework. We first formally define the policy fragility boundary and establish a theoretical foundation for robustness based on boundedness of the value function distribution. An expert-guided critic alignment mechanism is introduced, minimizing Wasserstein distance to mitigate catastrophic forgetting. Furthermore, we integrate progressive adversarial perturbation–based curriculum learning with projected gradient descent. Results: Evaluated on a 3D dynamic obstacle-avoidance task, our approach achieves a 15% increase in cumulative reward and reduces collision incidents by over 30% compared to baseline methods, demonstrating significantly improved generalization and operational safety.

Technology Category

Application Category

📝 Abstract
Reinforcement learning (RL) policies deployed in safety-critical systems, such as unmanned aerial vehicle (UAV) navigation in dynamic airspace, are vulnerable to out-ofdistribution (OOD) adversarial attacks in the observation space. These attacks induce distributional shifts that significantly degrade value estimation, leading to unsafe or suboptimal decision making rendering the existing policy fragile. To address this vulnerability, we propose an antifragile RL framework designed to adapt against curriculum of incremental adversarial perturbations. The framework introduces a simulated attacker which incrementally increases the strength of observation-space perturbations which enables the RL agent to adapt and generalize across a wider range of OOD observations and anticipate previously unseen attacks. We begin with a theoretical characterization of fragility, formally defining catastrophic forgetting as a monotonic divergence in value function distributions with increasing perturbation strength. Building on this, we define antifragility as the boundedness of such value shifts and derive adaptation conditions under which forgetting is stabilized. Our method enforces these bounds through iterative expert-guided critic alignment using Wasserstein distance minimization across incrementally perturbed observations. We empirically evaluate the approach in a UAV deconfliction scenario involving dynamic 3D obstacles. Results show that the antifragile policy consistently outperforms standard and robust RL baselines when subjected to both projected gradient descent (PGD) and GPS spoofing attacks, achieving up to 15% higher cumulative reward and over 30% fewer conflict events. These findings demonstrate the practical and theoretical viability of antifragile reinforcement learning for secure and resilient decision-making in environments with evolving threat scenarios.
Problem

Research questions and friction points this paper is trying to address.

Addresses UAV navigation vulnerabilities to adversarial observation-space attacks
Proposes antifragile RL to adapt against incremental adversarial perturbations
Ensures secure UAV deconfliction under evolving threat scenarios
Innovation

Methods, ideas, or system contributions that make the work stand out.

Antifragile RL framework against adversarial perturbations
Simulated attacker for incremental observation-space perturbations
Wasserstein distance minimization for critic alignment
🔎 Similar Papers
No similar papers found.
D
Deepak Kumar Panda
Faculty of Engineering and Applied Sciences, Cranfield University, MK43 0AL Cranfield, U.K
A
Adolfo Perrusquia
Faculty of Engineering and Applied Sciences, Cranfield University, MK43 0AL Cranfield, U.K
Weisi Guo
Weisi Guo
Professor & Head of Centre - Cranfield University; Visiting Fellow - Alan Turing Inst.
Graph Signal ProcessingNetworksAdversarial AIAutonomySocial Physics