🤖 AI Summary
To address the curse of dimensionality in large-scale multi-agent reinforcement learning (MARL), existing mean-field (MF) approaches suffer from substantial aggregation noise due to coarse-grained agent aggregation, which neglects agent heterogeneity. This paper proposes the Bi-level Mean-Field (BMF) framework: it introduces a novel bi-level interaction mechanism that explicitly decouples fine-grained intra-group interactions from coarse-grained inter-group interactions; and incorporates a VAE-based dynamic grouping module to adaptively model agent heterogeneity. BMF preserves the scalability of MF methods while significantly enhancing individual policy representation fidelity. Experiments on multiple large-scale MARL benchmarks demonstrate that BMF achieves a 32% faster convergence rate and an average 27% improvement in policy performance over state-of-the-art methods. Moreover, BMF effectively suppresses aggregation noise throughout training, establishing new performance benchmarks in scalable MARL.
📝 Abstract
Large-scale Multi-Agent Reinforcement Learning (MARL) often suffers from the curse of dimensionality, as the exponential growth in agent interactions significantly increases computational complexity and impedes learning efficiency. To mitigate this, existing efforts that rely on Mean Field (MF) simplify the interaction landscape by approximating neighboring agents as a single mean agent, thus reducing overall complexity to pairwise interactions. However, these MF methods inevitably fail to account for individual differences, leading to aggregation noise caused by inaccurate iterative updates during MF learning. In this paper, we propose a Bi-level Mean Field (BMF) method to capture agent diversity with dynamic grouping in large-scale MARL, which can alleviate aggregation noise via bi-level interaction. Specifically, BMF introduces a dynamic group assignment module, which employs a Variational AutoEncoder (VAE) to learn the representations of agents, facilitating their dynamic grouping over time. Furthermore, we propose a bi-level interaction module to model both inter- and intra-group interactions for effective neighboring aggregation. Experiments across various tasks demonstrate that the proposed BMF yields results superior to the state-of-the-art methods. Our code will be made publicly available.