🤖 AI Summary
Scaling multi-agent reinforcement learning (MARL) systems faces severe coordination challenges and scalability bottlenecks as agent count increases. Method: This paper proposes LEED, a decentralized MARL framework that leverages large language models (LLMs) to generate high-quality, environment-specific instructional demonstrations; agents then incorporate these expert demonstrations into local policy optimization without centralized communication. A novel expert-policy loss function is introduced to explicitly guide policy learning via demonstration imitation. Contribution/Results: LEED significantly improves sample efficiency and training speed while preserving decentralization. Extensive experiments across multiple benchmark tasks demonstrate that LEED outperforms existing state-of-the-art methods in both final performance and convergence rate, achieving strong scalability, high sample efficiency, and rapid convergence—without requiring centralized coordination or explicit inter-agent communication.
📝 Abstract
Multi-agent reinforcement learning (MARL) holds substantial promise for intelligent decision-making in complex environments. However, it suffers from a coordination and scalability bottleneck as the number of agents increases. To address these issues, we propose the LLM-empowered expert demonstrations framework for multi-agent reinforcement learning (LEED). LEED consists of two components: a demonstration generation (DG) module and a policy optimization (PO) module. Specifically, the DG module leverages large language models to generate instructions for interacting with the environment, thereby producing high-quality demonstrations. The PO module adopts a decentralized training paradigm, where each agent utilizes the generated demonstrations to construct an expert policy loss, which is then integrated with its own policy loss. This enables each agent to effectively personalize and optimize its local policy based on both expert knowledge and individual experience. Experimental results show that LEED achieves superior sample efficiency, time efficiency, and robust scalability compared to state-of-the-art baselines.