🤖 AI Summary
This study addresses the challenge of maintaining safe separation among small unmanned aircraft systems (sUAS) under GPS signal disruption or spoofing attacks. The authors model observation perturbations as a zero-sum game between agents and an adversary and derive a closed-form expression for worst-case perturbations—accurate to second order—that requires no adversarial training. This expression is integrated into a multi-agent reinforcement learning policy gradient algorithm. Theoretical analysis demonstrates that safety performance degrades linearly with perturbation probability. Experimental results in high-density sUAS scenarios show that, even with up to 35% corrupted observations, the proposed method achieves near-zero collision rates, substantially outperforming non-adversarial baseline approaches.
📝 Abstract
We address robust separation assurance for small Unmanned Aircraft Systems (sUAS) under GPS degradation and spoofing via Multi-Agent Reinforcement Learning (MARL). In cooperative surveillance, each aircraft (or agent) broadcasts its GPS-derived position; when such position broadcasts are corrupted, the entire observed air traffic state becomes unreliable. We cast this state observation corruption as a zero-sum game between the agents and an adversary: with probability R, the adversary perturbs the observed state to maximally degrade each agent's safety performance. We derive a closed-form expression for this adversarial perturbation, bypassing adversarial training entirely and enabling linear-time evaluation in the state dimension. We show that this expression approximates the true worst-case adversarial perturbation with second-order accuracy. We further bound the safety performance gap between clean and corrupted observations, showing that it degrades at most linearly with the corruption probability under Kullback-Leibler regularization. Finally, we integrate the closed-form adversarial policy into a MARL policy gradient algorithm to obtain a robust counter-policy for the agents. In a high-density sUAS simulation, we observe near-zero collision rates under corruption levels up to 35%, outperforming a baseline policy trained without adversarial perturbations.