Learning Progress Driven Multi-Agent Curriculum

📅 2022-05-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In multi-agent reinforcement learning (MARL), reward-based automatic curriculum learning faces two key bottlenecks: high variance in expected returns and deteriorating credit assignment as the number of agents increases. Method: This paper proposes using TD-error-based learning progress—not reward—as a dynamic curriculum signal, decoupling task difficulty from training progress. We design a distributed agent-count sampling mechanism coupled with progressive contextual transfer to construct the first learning-progress-driven curriculum framework for MARL under sparse-reward settings. Contribution/Results: To our knowledge, this is the first work to employ learning progress—rather than cumulative reward—as the curriculum driver in MARL, thereby circumventing both reward-variance and credit-assignment issues inherent in conventional approaches. Evaluated on three sparse-reward MARL benchmarks, our method significantly outperforms state-of-the-art methods, achieving superior training stability and final performance.
📝 Abstract
The number of agents can be an effective curriculum variable for controlling the difficulty of multi-agent reinforcement learning (MARL) tasks. Existing work typically uses manually defined curricula such as linear schemes. We identify two potential flaws while applying existing reward-based automatic curriculum learning methods in MARL: (1) The expected episode return used to measure task difficulty has high variance; (2) Credit assignment difficulty can be exacerbated in tasks where increasing the number of agents yields higher returns which is common in many MARL tasks. To address these issues, we propose to control the curriculum by using a TD-error based *learning progress* measure and by letting the curriculum proceed from an initial context distribution to the final task specific one. Since our approach maintains a distribution over the number of agents and measures learning progress rather than absolute performance, which often increases with the number of agents, we alleviate problem (2). Moreover, the learning progress measure naturally alleviates problem (1) by aggregating returns. In three challenging sparse-reward MARL benchmarks, our approach outperforms state-of-the-art baselines.
Problem

Research questions and friction points this paper is trying to address.

High variance in measuring MARL task difficulty via returns
Credit assignment worsens with more agents in MARL
Need adaptive curriculum for agent-count control in MARL
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses TD-error based learning progress measure
Controls curriculum via agent number distribution
Aggregates returns to reduce variance
🔎 Similar Papers
No similar papers found.
Wenshuai Zhao
Wenshuai Zhao
Aalto University
RoboticsReinforcement Learning
Z
Zhiyuan Li
Department of Electrical Engineering and Automation, Aalto University, Espoo, Finland
J
J. Pajarinen
Department of Electrical Engineering and Automation, Aalto University, Espoo, Finland