π€ AI Summary
This work addresses the challenge of policy learning in decentralized multi-agent systems under limited observability and weak environmental coupling, where agentsβ cognitive constraints hinder accurate environment modeling. The authors propose that each agent independently constructs a potentially misspecified belief model based on local observations and employs either Q-value iteration or a softmax policy for decision-making. For the first time, they establish the existence of an Empirical Evidence Equilibrium (EEE) in settings where joint actions influence environmental dynamics, under weak coupling conditions. This result is further extended to softmax policies by deriving corresponding contraction conditions. The study thus provides theoretical convergence guarantees and stability analysis for decentralized decision-making in partially observable multi-agent environments.
π Abstract
Strategic multi-agent systems are fundamentally characterized by decentralization, uncertainty, and ambiguity. Agents operating under limited observations will often need to make decisions based on simplified internal models of the environment, reflecting bounded rationality in both computational capacity and environmental knowledge. The Empirical Evidence Equilibrium (EEE) framework explicitly accounts for these limitations by modeling each agent as forming a potentially misspecified belief derived from signals obtained through partial observations of the environment. The resulting equilibrium concept captures the system's steady state under bounded rationality and decentralization. In this work, we study games in which the environment dynamics are driven jointly by exogenous factors and agents' actions. We analyze agent behavior under Q-value iteration where each agent independently forms a belief model, computes Q-values, and derives a greedy strategy, yet the collective actions of all agents jointly shape the environment each agent faces at the next stage. We prove that despite this decentralization, an EEE emerges from the joint dynamics when the coupling between agents' actions and the environment is sufficiently weak. We further extend this result to softmax policies, establishing a contraction result under a sufficient coupling condition.