Evader-Agnostic Team-Based Pursuit Strategies in Partially-Observable Environments

📅 2025-11-08
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the search–pursuit–interception problem in urban environments under partial observability, where two pursuer UAVs must locate and intercept an evader UAV with unknown initial position, target location, and behavioral policy. Formulated as a multi-agent, partially observable pursuit–evasion game, the problem demands robust coordination amid sensory limitations and adversarial uncertainty. We propose a bounded-rational, two-stage neuro-symbolic algorithm: (1) an offline phase employing hierarchical adversarial training to synthesize robust collaborative policies; and (2) an online phase integrating PO-MDP modeling, deep reinforcement learning, and a lightweight opponent-type classifier to enable real-time inference of the evader’s dynamic behavior and adaptive response. Experiments demonstrate substantial improvements: +32.7% interception success rate and 41% reduction in inter-agent communication overhead under stochastic evasion policies. To our knowledge, this is the first work to systematically integrate neuro-symbolic reasoning and bounded rationality modeling into urban airspace multi-UAV pursuit–evasion tasks.

Technology Category

Application Category

📝 Abstract
We consider a scenario where a team of two unmanned aerial vehicles (UAVs) pursue an evader UAV within an urban environment. Each agent has a limited view of their environment where buildings can occlude their field-of-view. Additionally, the pursuer team is agnostic about the evader in terms of its initial and final location, and the behavior of the evader. Consequently, the team needs to gather information by searching the environment and then track it to eventually intercept. To solve this multi-player, partially-observable, pursuit-evasion game, we develop a two-phase neuro-symbolic algorithm centered around the principle of bounded rationality. First, we devise an offline approach using deep reinforcement learning to progressively train adversarial policies for the pursuer team against fictitious evaders. This creates $k$-levels of rationality for each agent in preparation for the online phase. Then, we employ an online classification algorithm to determine a"best guess"of our current opponent from the set of iteratively-trained strategic agents and apply the best player response. Using this schema, we improved average performance when facing a random evader in our environment.
Problem

Research questions and friction points this paper is trying to address.

Developing UAV pursuit strategies without evader knowledge
Addressing partial observability in urban pursuit scenarios
Creating adaptive algorithms for dynamic opponent classification
Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-phase neuro-symbolic algorithm with bounded rationality
Offline deep reinforcement learning trains adversarial policies
Online classification determines best response to opponent
🔎 Similar Papers
No similar papers found.
A
Addison Kalanther
University of California Berkeley
D
Daniel Bostwick
University of California Berkeley
C
C. Maheshwari
University of California Berkeley
Shankar Sastry
Shankar Sastry
University of California
roboticscontrolhybrid systemscyber securityembedded systems