π€ AI Summary
In discounted infinitely repeated games where players observe only pure-strategy outcomes and thus cannot directly detect deviations from mixed strategies, conventional trigger strategies fail. This work proposes a βtest-then-punishβ mechanism: players agree on a cooperative mixed strategy and continuously verify behavioral consistency through embedded statistical hypothesis tests, initiating permanent punishment only after accumulating sufficient evidence of deviation. The paper innovatively introduces a fault-tolerant equilibrium concept that disregards histories occurring with vanishingly small probability. By employing both anytime-valid sequential testing and fixed-batch testing procedures, it constructs Nash and subgame-perfect equilibria, respectively. Under mild conditions, for sufficiently patient players, the mechanism sustains any feasible and individually rational payoff while ensuring finite expected detection time and robustness against arbitrary deviations.
π Abstract
We study discounted infinitely repeated games in which players agree on a cooperative mixed action profile but, at each step, observe only the realized pure actions. This form of imperfect monitoring breaks classical trigger strategies, since deviations cannot be identified with certainty. To address this problem, we study how hypothesis testing can be used to sustain cooperation. First, we develop a framework that embeds statistical inference directly into strategic behavior. We introduce relaxed equilibrium notions that allow players to ignore vanishing probability histories arising from rare but extreme realizations of the monitoring process. Within this framework, we formalize a generic test then punish strategy: players commit ex ante to a cooperative mixed action profile, continuously test whether observed play is consistent with this prescription, and permanently switch to punishment once sufficient statistical evidence of deviation accumulates. Under mild conditions on the testing procedure, this construction sustains any feasible and individually rational payoff for sufficiently patient players, yielding a Folk theorem type result under imperfect monitoring. We then propose two explicit implementations of this strategy. The first relies on anytime valid sequential tests, providing uniform control of Type I error over an infinite horizon and a finite expected detection time for payoff-relevant deviations. However, this strategy only accounts for stationary deviations and yields a Nash equilibrium. The second uses testing over batches with a fixed size, accommodating arbitrary deviations and achieving subgame perfect Nash equilibrium, at the cost of losing global anytime guarantees on false punishments.