🤖 AI Summary
This paper addresses the challenge of predicting strategies of Bayesian learners in infinite-player games, where traditional approaches fail to ensure both epistemic consistency and strategic robustness across arbitrary computable multi-agent environments.
Method: We introduce a generalized computable strategy class—uniquely achieving “truth-granularity completeness”—by unifying computability theory, Bayesian inference, and Thompson sampling. Our self-predictive strategy model requires no explicit planning, instead relying on computable approximations for practical deployment.
Contribution/Results: Theoretically, our framework guarantees classical convergence in known environments and ε-Nash equilibrium convergence in unknown repeated games. Crucially, the strategy class encompasses all computable strategies and their Bayesian-optimal responses under any reasonable prior, thereby ensuring mutual consistency between beliefs and actions. This provides the first unified solution to both the truth-granularity problem—i.e., representing agents’ beliefs at appropriate levels of detail—and the prediction-consistency challenge—i.e., aligning predicted behavior with actual strategic reasoning.
📝 Abstract
A Bayesian player acting in an infinite multi-player game learns to predict the other players' strategies if his prior assigns positive probability to their play (or contains a grain of truth). Kalai and Lehrer's classic grain of truth problem is to find a reasonably large class of strategies that contains the Bayes-optimal policies with respect to this class, allowing mutually-consistent beliefs about strategy choice that obey the rules of Bayesian inference. Only small classes are known to have a grain of truth and the literature contains several related impossibility results. In this paper we present a formal and general solution to the full grain of truth problem: we construct a class of strategies wide enough to contain all computable strategies as well as Bayes-optimal strategies for every reasonable prior over the class. When the "environment" is a known repeated stage game, we show convergence in the sense of [KL93a] and [KL93b]. When the environment is unknown, agents using Thompson sampling converge to play $varepsilon$-Nash equilibria in arbitrary unknown computable multi-agent environments. Finally, we include an application to self-predictive policies that avoid planning. While these results use computability theory only as a conceptual tool to solve a classic game theory problem, we show that our solution can naturally be computationally approximated arbitrarily closely.