Equilibrium Policy Generalization: A Reinforcement Learning Framework for Cross-Graph Zero-Shot Generalization in Pursuit-Evasion Games

📅 2025-11-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In Pursuit-Evasion Games (PEGs), dynamic graph structures hinder policy generalization and necessitate frequent fine-tuning. Method: This paper proposes the first reinforcement learning framework enabling cross-graph zero-shot transfer. It introduces Nash equilibrium policies as supervisory signals for training, integrates sequence modeling for joint policy decomposition, and designs distance-based graph-invariant features alongside an equilibrium-inspired heuristic to enhance multi-agent scalability. Contributions/Results: (1) Achieves zero-shot generalization to unseen graph topologies and exit configurations; (2) On real-world graph datasets, the pursuit policy attains zero-shot performance comparable to state-of-the-art (SOTA) methods *after* fine-tuning—eliminating the need for task-specific adaptation and significantly reducing deployment overhead.

Technology Category

Application Category

📝 Abstract
Equilibrium learning in adversarial games is an important topic widely examined in the fields of game theory and reinforcement learning (RL). Pursuit-evasion game (PEG), as an important class of real-world games from the fields of robotics and security, requires exponential time to be accurately solved. When the underlying graph structure varies, even the state-of-the-art RL methods require recomputation or at least fine-tuning, which can be time-consuming and impair real-time applicability. This paper proposes an Equilibrium Policy Generalization (EPG) framework to effectively learn a generalized policy with robust cross-graph zero-shot performance. In the context of PEGs, our framework is generally applicable to both pursuer and evader sides in both no-exit and multi-exit scenarios. These two generalizability properties, to our knowledge, are the first to appear in this domain. The core idea of the EPG framework is to train an RL policy across different graph structures against the equilibrium policy for each single graph. To construct an equilibrium oracle for single-graph policies, we present a dynamic programming (DP) algorithm that provably generates pure-strategy Nash equilibrium with near-optimal time complexity. To guarantee scalability with respect to pursuer number, we further extend DP and RL by designing a grouping mechanism and a sequence model for joint policy decomposition, respectively. Experimental results show that, using equilibrium guidance and a distance feature proposed for cross-graph PEG training, the EPG framework guarantees desirable zero-shot performance in various unseen real-world graphs. Besides, when trained under an equilibrium heuristic proposed for the graphs with exits, our generalized pursuer policy can even match the performance of the fine-tuned policies from the state-of-the-art PEG methods.
Problem

Research questions and friction points this paper is trying to address.

Achieving cross-graph zero-shot generalization in pursuit-evasion games
Eliminating recomputation needs when graph structures vary
Developing equilibrium-guided policies for pursuer and evader roles
Innovation

Methods, ideas, or system contributions that make the work stand out.

Trains RL policies across diverse graph structures
Uses dynamic programming for Nash equilibrium computation
Implements grouping mechanism for scalable multi-agent training
🔎 Similar Papers
No similar papers found.
R
Runyu Lu
School of Artificial Intelligence, University of Chinese Academy of Sciences
P
Peng Zhang
School of Future Technology, Dalian University of Technology
R
Ruochuan Shi
Institute of Automation, Chinese Academy of Sciences, University of Chinese Academy of Sciences
Yuanheng Zhu
Yuanheng Zhu
Institute of Automation, Chinese Academy of Sciences
Dongbin Zhao
Dongbin Zhao
Institute of Automation, Chinese Academy of Sciences
Deep Reinforcement LearningAdaptive Dynamic ProgrammingGame AISmart drivingrobotics
Y
Yang Liu
School of Future Technology, Dalian University of Technology
D
Dong Wang
School of Information and Communication Engineering, Dalian University of Technology
Cesare Alippi
Cesare Alippi
Università della Svizzera italiana, Politecnico di Milano
Machine learningGraph Deep Learningintelligence in emebedded systemscomputational intelligence