LEED: A Highly Efficient and Scalable LLM-Empowered Expert Demonstrations Framework for Multi-Agent Reinforcement Learning

📅 2025-09-18

📈 Citations: 0

✨ Influential: 0

career value

213K/year

🤖 AI Summary

Scaling multi-agent reinforcement learning (MARL) systems faces severe coordination challenges and scalability bottlenecks as agent count increases. Method: This paper proposes LEED, a decentralized MARL framework that leverages large language models (LLMs) to generate high-quality, environment-specific instructional demonstrations; agents then incorporate these expert demonstrations into local policy optimization without centralized communication. A novel expert-policy loss function is introduced to explicitly guide policy learning via demonstration imitation. Contribution/Results: LEED significantly improves sample efficiency and training speed while preserving decentralization. Extensive experiments across multiple benchmark tasks demonstrate that LEED outperforms existing state-of-the-art methods in both final performance and convergence rate, achieving strong scalability, high sample efficiency, and rapid convergence—without requiring centralized coordination or explicit inter-agent communication.

Technology Category

Application Category

📝 Abstract

Multi-agent reinforcement learning (MARL) holds substantial promise for intelligent decision-making in complex environments. However, it suffers from a coordination and scalability bottleneck as the number of agents increases. To address these issues, we propose the LLM-empowered expert demonstrations framework for multi-agent reinforcement learning (LEED). LEED consists of two components: a demonstration generation (DG) module and a policy optimization (PO) module. Specifically, the DG module leverages large language models to generate instructions for interacting with the environment, thereby producing high-quality demonstrations. The PO module adopts a decentralized training paradigm, where each agent utilizes the generated demonstrations to construct an expert policy loss, which is then integrated with its own policy loss. This enables each agent to effectively personalize and optimize its local policy based on both expert knowledge and individual experience. Experimental results show that LEED achieves superior sample efficiency, time efficiency, and robust scalability compared to state-of-the-art baselines.

Problem

Research questions and friction points this paper is trying to address.

Addresses coordination and scalability bottlenecks in multi-agent reinforcement learning

Leverages LLMs to generate high-quality expert demonstrations for agents

Enables decentralized policy optimization combining expert knowledge with individual experience

Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages LLMs to generate expert demonstrations

Uses decentralized training with expert policy loss

Integrates expert knowledge with individual experience

🔎 Similar Papers

Mutual Enhancement of Large Language and Reinforcement Learning Models through Bi-Directional Feedback Mechanisms: A Case Study