🤖 AI Summary
Control-as-inference (CAI) frameworks struggle to scale to decentralized multi-agent stochastic games (SGs) due to non-stationarity and misaligned agent objectives.
Method: We propose the first variational inference formulation for general SGs, modeling joint policy learning as a distributed variational inference problem. To address non-stationarity and objective misalignment, we rigorously prove that the resulting policies constitute an ε-Nash equilibrium and establish convergence guarantees.
Contribution/Results: Our framework unifies multi-agent reinforcement learning, game theory (Nash and correlated equilibria), and decentralized optimization—yielding multiple provably optimal equilibrium-finding algorithms that operate without global coordination. Theoretically, it ensures robustness and interpretability in non-stationary environments; empirically, it significantly improves cooperative decision-making performance and stability under decentralization.
📝 Abstract
The Control as Inference (CAI) framework has successfully transformed single-agent reinforcement learning (RL) by reframing control tasks as probabilistic inference problems. However, the extension of CAI to multi-agent, general-sum stochastic games (SGs) remains underexplored, particularly in decentralized settings where agents operate independently without centralized coordination. In this paper, we propose a novel variational inference framework tailored to decentralized multi-agent systems. Our framework addresses the challenges posed by non-stationarity and unaligned agent objectives, proving that the resulting policies form an $epsilon$-Nash equilibrium. Additionally, we demonstrate theoretical convergence guarantees for the proposed decentralized algorithms. Leveraging this framework, we instantiate multiple algorithms to solve for Nash equilibrium, mean-field Nash equilibrium, and correlated equilibrium, with rigorous theoretical convergence analysis.