Stochastic Games with Limited Public Memory

📅 2025-05-05

📈 Citations: 0

✨ Influential: 0

career value

224K/year

🤖 AI Summary

This paper investigates the minimal public memory size required for near-optimal strategies in two-player zero-sum stochastic games under the long-run average payoff criterion. Addressing the fundamental question of whether uniform ε-optimality can be achieved with finite public memory, the authors integrate tools from game theory, probabilistic analysis, and memory complexity theory. They establish, for the first time, a tight bound: uniform ε-optimal strategies exist using only O(log n) public memory states—improving upon the prior O(n) upper bound and proving its asymptotic optimality. They further construct an explicit, stationary strategy that is time-independent and satisfies the logarithmic memory bound almost surely. Finally, they rigorously refute the feasibility of finite public memory for uniform ε-optimality in the canonical Big Match game, demonstrating its inherent limitation in this setting.

Technology Category

Application Category

📝 Abstract

We study the memory resources required for near-optimal play in two-player zero-sum stochastic games with the long-run average payoff. Although optimal strategies may not exist in such games, near-optimal strategies always do. Mertens and Neyman (1981) proved that in any stochastic game, for any $varepsilon>0$, there exist uniform $varepsilon$-optimal memory-based strategies -- i.e., strategies that are $varepsilon$-optimal in all sufficiently long $n$-stage games -- that use at most $O(n)$ memory states within the first $n$ stages. We improve this bound on the number of memory states by proving that in any stochastic game, for any $varepsilon>0$, there exist uniform $varepsilon$-optimal memory-based strategies that use at most $O(log n)$ memory states in the first $n$ stages. Moreover, we establish the existence of uniform $varepsilon$-optimal memory-based strategies whose memory updating and action selection are time-independent and such that, with probability close to 1, for all $n$, the number of memory states used up to stage $n$ is at most $O(log n)$. This result cannot be extended to strategies with bounded public memory -- even if time-dependent memory updating and action selection are allowed. This impossibility is illustrated in the Big Match -- a well-known stochastic game where the stage payoffs to Player 1 are 0 or 1. Although for any $varepsilon>0$, there exist strategies of Player 1 that guarantee a payoff {exceeding} $1/2 - varepsilon$ in all sufficiently long $n$-stage games, we show that any strategy of Player 1 that uses a finite public memory fails to guarantee a payoff greater than $varepsilon$ in any sufficiently long $n$-stage game.

Problem

Research questions and friction points this paper is trying to address.

Reducing memory states in stochastic games

Improving bounds for near-optimal strategies

Analyzing limitations of finite public memory

Innovation

Methods, ideas, or system contributions that make the work stand out.

Reduces memory states to O(log n)

Ensures time-independent memory updating

Proves finite public memory insufficiency

🔎 Similar Papers

Learning in Games with progressive hiding