🤖 AI Summary
This work addresses the search–pursuit–interception problem in urban environments under partial observability, where two pursuer UAVs must locate and intercept an evader UAV with unknown initial position, target location, and behavioral policy. Formulated as a multi-agent, partially observable pursuit–evasion game, the problem demands robust coordination amid sensory limitations and adversarial uncertainty. We propose a bounded-rational, two-stage neuro-symbolic algorithm: (1) an offline phase employing hierarchical adversarial training to synthesize robust collaborative policies; and (2) an online phase integrating PO-MDP modeling, deep reinforcement learning, and a lightweight opponent-type classifier to enable real-time inference of the evader’s dynamic behavior and adaptive response. Experiments demonstrate substantial improvements: +32.7% interception success rate and 41% reduction in inter-agent communication overhead under stochastic evasion policies. To our knowledge, this is the first work to systematically integrate neuro-symbolic reasoning and bounded rationality modeling into urban airspace multi-UAV pursuit–evasion tasks.
📝 Abstract
We consider a scenario where a team of two unmanned aerial vehicles (UAVs) pursue an evader UAV within an urban environment. Each agent has a limited view of their environment where buildings can occlude their field-of-view. Additionally, the pursuer team is agnostic about the evader in terms of its initial and final location, and the behavior of the evader. Consequently, the team needs to gather information by searching the environment and then track it to eventually intercept. To solve this multi-player, partially-observable, pursuit-evasion game, we develop a two-phase neuro-symbolic algorithm centered around the principle of bounded rationality. First, we devise an offline approach using deep reinforcement learning to progressively train adversarial policies for the pursuer team against fictitious evaders. This creates $k$-levels of rationality for each agent in preparation for the online phase. Then, we employ an online classification algorithm to determine a"best guess"of our current opponent from the set of iteratively-trained strategic agents and apply the best player response. Using this schema, we improved average performance when facing a random evader in our environment.