Internal State-Based Policy Gradient Methods for Partially Observable Markov Potential Games

📅 2026-03-31

📈 Citations: 0

✨ Influential: 0

career value

259K/year

🤖 AI Summary

This work addresses the challenges in multi-agent reinforcement learning within partially observable Markov potential games, where local observations, decentralized information, and the curse of dimensionality hinder effective coordination. To tackle these issues, the authors propose a natural policy gradient method that integrates a common information framework with an internal state mechanism. By employing finite-state controllers to compress historical observations, the approach reduces policy space complexity while provably approximating Nash equilibria. Theoretically, the study establishes the first non-asymptotic convergence bound for such algorithms, decomposing the error into statistical estimation error and controller approximation error, thereby offering interpretability guarantees. Empirical results demonstrate that the proposed method significantly outperforms baseline approaches relying solely on current observations across multiple partially observable environments, confirming both its practical efficacy and theoretical advantages.

Technology Category

Application Category

📝 Abstract

This letter studies multi-agent reinforcement learning in partially observable Markov potential games. Solving this problem is challenging due to partial observability, decentralized information, and the curse of dimensionality. First, to address the first two challenges, we leverage the common information framework, which allows agents to act based on both shared and local information. Second, to ensure tractability, we study an internal state that compresses accumulated information, preventing it from growing unboundedly over time. We then implement an internal state-based natural policy gradient method to find Nash equilibria of the Markov potential game. Our main contribution is to establish a non-asymptotic convergence bound for this method. Our theoretical bound decomposes into two interpretable components: a statistical error term that also arises in standard Markov potential games, and an approximation error capturing the use of finite-state controllers. Finally, simulations across multiple partially observable environments demonstrate that the proposed method using finite-state controllers achieves consistent improvements in performance compared to the setting where only the current observation is used.

Problem

Research questions and friction points this paper is trying to address.

Partially Observable Markov Potential Games

Multi-agent Reinforcement Learning

Partial Observability

Decentralized Information

Curse of Dimensionality

Innovation

Methods, ideas, or system contributions that make the work stand out.

internal state

natural policy gradient

partially observable Markov potential games