π€ AI Summary
This work addresses the ambiguity and data scarcity inherent in reward function identification for two-player zero-sum games by proposing a unified inverse reward learning framework. Leveraging observed agent policies, the framework reconstructs the underlying reward functions in both entropy-regularized static matrix games and dynamic Markov games. The key innovation lies in establishing, for the first time, identifiability conditions for linear reward functions under quantal response equilibrium, and in designing a general-purpose learning algorithm applicable to both static and dynamic settings. By integrating quantal response equilibrium, entropy regularization, and maximum likelihood estimation, the method achieves sample-efficient learning. Theoretical analysis confirms the algorithmβs reliability and sample efficiency, while numerical experiments demonstrate its effectiveness in competitive decision-making environments.
π Abstract
Estimating the unknown reward functions driving agents'behaviors is of central interest in inverse reinforcement learning and game theory. To tackle this problem, we develop a unified framework for reward function recovery in two-player zero-sum matrix games and Markov games with entropy regularization, where we aim to reconstruct the underlying reward functions given observed players'strategies and actions. This task is challenging due to the inherent ambiguity of inverse problems, the non-uniqueness of feasible rewards, and limited observational data coverage. To address these challenges, we establish the reward function's identifiability using the quantal response equilibrium (QRE) under linear assumptions. Building upon this theoretical foundation, we propose a novel algorithm to learn reward functions from observed actions. Our algorithm works in both static and dynamic settings and is adaptable to incorporate different methods, such as Maximum Likelihood Estimation (MLE). We provide strong theoretical guarantees for the reliability and sample efficiency of our algorithm. Further, we conduct extensive numerical studies to demonstrate the practical effectiveness of the proposed framework, offering new insights into decision-making in competitive environments.