Successor Features for Transfer in Alternating Markov Games

📅 2025-07-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In zero-sum, fully observable, turn-based alternating Markov games, significant task disparities render conventional value- or equilibrium-based transfer ineffective. To address this, we introduce Successor Features into multi-agent reinforcement learning and propose Game Generalized Policy Improvement (GGPI), a novel algorithm enabling one-shot policy transfer across tasks. We theoretically derive an upper bound on transfer error that decreases with task similarity. Empirical evaluation in pursuit-evasion games demonstrates that GGPI substantially improves policy success rates and path efficiency under diverse initial conditions, achieving high-reward interactions and outperforming baseline methods. Our core contribution is the first extension of the successor features framework to competitive multi-agent settings, establishing a new paradigm for multi-agent knowledge transfer that is generalizable, interpretable, and supported by rigorous theoretical guarantees.

Technology Category

Application Category

📝 Abstract
This paper explores successor features for knowledge transfer in zero-sum, complete-information, and turn-based games. Prior research in single-agent systems has shown that successor features can provide a ``jump start" for agents when facing new tasks with varying reward structures. However, knowledge transfer in games typically relies on value and equilibrium transfers, which heavily depends on the similarity between tasks. This reliance can lead to failures when the tasks differ significantly. To address this issue, this paper presents an application of successor features to games and presents a novel algorithm called Game Generalized Policy Improvement (GGPI), designed to address Markov games in multi-agent reinforcement learning. The proposed algorithm enables the transfer of learning values and policies across games. An upper bound of the errors for transfer is derived as a function the similarity of the task. Through experiments with a turn-based pursuer-evader game, we demonstrate that the GGPI algorithm can generate high-reward interactions and one-shot policy transfer. When further tested in a wider set of initial conditions, the GGPI algorithm achieves higher success rates with improved path efficiency compared to those of the baseline algorithms.
Problem

Research questions and friction points this paper is trying to address.

Transferring knowledge in zero-sum turn-based games
Overcoming task similarity reliance in game transfers
Enabling one-shot policy transfer across Markov games
Innovation

Methods, ideas, or system contributions that make the work stand out.

Successor features for knowledge transfer
Game Generalized Policy Improvement algorithm
Transfer learning values and policies
🔎 Similar Papers
No similar papers found.
Sunny Amatya
Sunny Amatya
Arizona State University
RoboticsHRISoft-roboticsArtificial IntelligenceMulti agent systems
Y
Yi Ren
School for Engineering of Matter, Transport, and Energy, Arizona State University, Tempe, AZ, 85287, USA
Z
Zhe Xu
School for Engineering of Matter, Transport, and Energy, Arizona State University, Tempe, AZ, 85287, USA
W
Wenlong Zhang
School of Manufacturing Systems and Networks, Arizona State University, Mesa, AZ, 85212, USA