🤖 AI Summary
Conventional approaches—such as replicator dynamics or mean-field approximations—fail to accurately model individual decision-making in multi-agent reinforcement learning (MARL), as they neglect intrinsic stochasticity from exploration noise, environmental randomness, and gradient updates, leading to significant discrepancies between theoretical predictions and empirical learning trajectories.
Method: This work introduces the first coupled stochastic dynamical systems framework for MARL, explicitly preserving multiple algorithm-level stochastic sources. Leveraging stochastic differential equations and dynamical systems theory, we conduct agent-level stability and sensitivity analyses.
Contribution: Our framework relaxes the restrictive mean-field smoothness assumption, enabling precise characterization of individual agent learning dynamics. It uncovers universal principles governing decision evolution under stochastic perturbations and provides novel theoretical tools and design principles for enhancing interpretability, safety, and controllability of MARL systems.
📝 Abstract
Analysing learning behaviour in Multi-Agent Reinforcement Learning (MARL) environments is challenging, in particular with respect to extit{individual} decision-making. Practitioners frequently tend to study or compare MARL algorithms from a qualitative perspective largely due to the inherent stochasticity in practical algorithms arising from random dithering exploration strategies, environment transition noise, and stochastic gradient updates to name a few. Traditional analytical approaches, such as replicator dynamics, often rely on mean-field approximations to remove stochastic effects, but this simplification, whilst able to provide general overall trends, might lead to dissonance between analytical predictions and actual realisations of individual trajectories. In this paper, we propose a novel perspective on MARL systems by modelling them as extit{coupled stochastic dynamical systems}, capturing both agent interactions and environmental characteristics. Leveraging tools from dynamical systems theory, we analyse the stability and sensitivity of agent behaviour at individual level, which are key dimensions for their practical deployments, for example, in presence of strict safety requirements. This framework allows us, for the first time, to rigorously study MARL dynamics taking into consideration their inherent stochasticity, providing a deeper understanding of system behaviour and practical insights for the design and control of multi-agent learning processes.