Towards Model-Free Learning in Dynamic Population Games: An Application to Karma Economies

📅 2026-05-11
📈 Citations: 0
Influential: 0
📄 PDF

career value

211K/year
🤖 AI Summary
Existing approaches to dynamic population games rely on full model knowledge and centralized computation, limiting their applicability in realistic settings where agents only have access to individual experience. This work presents the first model-free, decentralized equilibrium learning framework for Karma-based dynamic population games. It introduces a deep reinforcement learning architecture that integrates fictitious play with smoothed policy iteration, tailored for two practical scenarios: agents joining an existing population and agents learning from scratch. Theoretical analysis establishes a suboptimality bound of $O(1/\sqrt{N_s}) + O(1/N)$, accounting for both DQN approximation error and mean-field perturbation error. Empirical results demonstrate that multiple agents can converge—without access to a global model—to a stationary Nash equilibrium closely approximating the centralized solution.
📝 Abstract
Dynamic Population Games (DPGs) provide a tractable framework for modeling strategic interactions in large populations of self-interested agents, and have been successfully applied to the design of Karma economies, a class of fair non-monetary resource allocation mechanisms. Despite their appealing theoretical properties, existing computational tools for DPGs assume full knowledge of the game model and operate in a centralized fashion, limiting their applicability in realistic settings where agents have access only to their own private experience. This paper takes a step towards addressing this gap by studying model-free equilibrium learning in Karma DPGs. First, we analyze the setting in which a novel agent joins a Karma DPG already at its Stationary Nash Equilibrium (SNE) and learns a policy via Deep Q-Networks (DQN) without knowledge of the game model. Leveraging recent convergence results for DQN, we establish a suboptimality bound consisting of a DQN approximation error of order $O(1/\sqrt{N_s})$ and a mean field perturbation error of order $O(1/N)$, where $N_s$ is the replay buffer size and $N$ is the population size. Second, we consider the challenging problem of learning the SNE from scratch. We show empirically that combining deep RL with fictitious play and smoothed policy iteration allows agents to converge, in a model-free fashion, to a configuration close to the centrally computed SNE. Together, these contributions support the vision of Karma economies as practical tools for fair resource allocation.
Problem

Research questions and friction points this paper is trying to address.

Dynamic Population Games
Model-Free Learning
Karma Economies
Equilibrium Learning
Stationary Nash Equilibrium
Innovation

Methods, ideas, or system contributions that make the work stand out.

model-free learning
Dynamic Population Games
Karma economies
Deep Q-Networks
Stationary Nash Equilibrium
🔎 Similar Papers