🤖 AI Summary
In group-agent reinforcement learning (GARL), heterogeneous agents struggle to efficiently leverage peer experience due to architectural and algorithmic disparities. Method: This paper proposes an adaptive group learning mechanism enabling asynchronous, on-demand knowledge sharing, with dynamic adoption of policies and models. It integrates heterogeneous RL algorithms, online policy/value-function evaluation and switching, asynchronous knowledge distillation, and performance-driven model selection. Contribution/Results: To our knowledge, this is the first scalable, low-coupling collaborative learning paradigm tailored for heterogeneous agents. Evaluated across 43 Atari games, the mechanism accelerates learning for 96% of agents, with 72% achieving >100× speedup and 41% attaining higher cumulative rewards using <5% of the training timesteps—demonstrating substantial improvements in individual learning efficiency and generalization capability.
📝 Abstract
Group-agent reinforcement learning (GARL) is a newly arising learning scenario, where multiple reinforcement learning agents study together in a group, sharing knowledge in an asynchronous fashion. The goal is to improve the learning performance of each individual agent. Under a more general heterogeneous setting where different agents learn using different algorithms, we advance GARL by designing novel and effective group-learning mechanisms. They guide the agents on whether and how to learn from action choices from the others, and allow the agents to adopt available policy and value function models sent by another agent if they perform better. We have conducted extensive experiments on a total of 43 different Atari 2600 games to demonstrate the superior performance of the proposed method. After the group learning, among the 129 agents examined, 96% are able to achieve a learning speed-up, and 72% are able to learn over 100 times faster. Also, around 41% of those agents have achieved a higher accumulated reward score by learning in less than 5% of the time steps required by a single agent when learning on its own.