LEED: A Highly Efficient and Scalable LLM-Empowered Expert Demonstrations Framework for Multi-Agent Reinforcement Learning

📅 2025-09-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Scaling multi-agent reinforcement learning (MARL) systems faces severe coordination challenges and scalability bottlenecks as agent count increases. Method: This paper proposes LEED, a decentralized MARL framework that leverages large language models (LLMs) to generate high-quality, environment-specific instructional demonstrations; agents then incorporate these expert demonstrations into local policy optimization without centralized communication. A novel expert-policy loss function is introduced to explicitly guide policy learning via demonstration imitation. Contribution/Results: LEED significantly improves sample efficiency and training speed while preserving decentralization. Extensive experiments across multiple benchmark tasks demonstrate that LEED outperforms existing state-of-the-art methods in both final performance and convergence rate, achieving strong scalability, high sample efficiency, and rapid convergence—without requiring centralized coordination or explicit inter-agent communication.

Technology Category

Application Category

📝 Abstract
Multi-agent reinforcement learning (MARL) holds substantial promise for intelligent decision-making in complex environments. However, it suffers from a coordination and scalability bottleneck as the number of agents increases. To address these issues, we propose the LLM-empowered expert demonstrations framework for multi-agent reinforcement learning (LEED). LEED consists of two components: a demonstration generation (DG) module and a policy optimization (PO) module. Specifically, the DG module leverages large language models to generate instructions for interacting with the environment, thereby producing high-quality demonstrations. The PO module adopts a decentralized training paradigm, where each agent utilizes the generated demonstrations to construct an expert policy loss, which is then integrated with its own policy loss. This enables each agent to effectively personalize and optimize its local policy based on both expert knowledge and individual experience. Experimental results show that LEED achieves superior sample efficiency, time efficiency, and robust scalability compared to state-of-the-art baselines.
Problem

Research questions and friction points this paper is trying to address.

Addresses coordination and scalability bottlenecks in multi-agent reinforcement learning
Leverages LLMs to generate high-quality expert demonstrations for agents
Enables decentralized policy optimization combining expert knowledge with individual experience
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages LLMs to generate expert demonstrations
Uses decentralized training with expert policy loss
Integrates expert knowledge with individual experience
🔎 Similar Papers
No similar papers found.
T
Tianyang Duan
Department of Computer Science, The University of Hong Kong, Hong Kong, China.
Z
Zongyuan Zhang
Department of Computer Science, The University of Hong Kong, Hong Kong, China.
S
Songxiao Guo
Department of Computer Science, The University of Hong Kong, Hong Kong, China.
D
Dong Huang
Institute of Data Science, National University of Singapore, Singapore.
Y
Yuanye Zhao
College of International Education, Hebei University of Economics and Business, China.
Z
Zheng Lin
Department of Electrical and Electronic Engineering, The University of Hong Kong, Hong Kong, China.
Z
Zihan Fang
Department of Computer Science, City University of Hong Kong, Hong Kong, China.
D
Dianxin Luan
Institute for Imaging, Data and Communications, University of Edinburgh, UK.
Heming Cui
Heming Cui
University of Hong Kong
Operating SystemsProgramming LanguageDistributed SystemsSecurity
Yong Cui
Yong Cui
Professor of Computer Science, Tsinghua University
Network ArchitectureMobile Computing