🤖 AI Summary
This work proposes K-Myriad, a novel framework that addresses the limitations of conventional parallel reinforcement learning methods, which typically accelerate only a single policy and struggle to balance exploration diversity with efficiency. K-Myriad introduces, for the first time, population-level state entropy maximization into unsupervised parallel exploration by co-training a cohort of heterogeneous agents. This approach automatically generates a diverse set of high-quality policies that serve as effective initializations for subsequent reinforcement learning. By integrating unsupervised learning, multi-agent collaboration, and state entropy optimization, the method substantially enhances both policy diversity and sample efficiency in high-dimensional continuous control tasks.
📝 Abstract
Parallelization in Reinforcement Learning is typically employed to speed up the training of a single policy, where multiple workers collect experience from an identical sampling distribution. This common design limits the potential of parallelization by neglecting the advantages of diverse exploration strategies. We propose K-Myriad, a scalable and unsupervised method that maximizes the collective state entropy induced by a population of parallel policies. By cultivating a portfolio of specialized exploration strategies, K-Myriad provides a robust initialization for Reinforcement Learning, leading to both higher training efficiency and the discovery of heterogeneous solutions. Experiments on high-dimensional continuous control tasks, with large-scale parallelization, demonstrate that K-Myriad can learn a broad set of distinct policies, highlighting its effectiveness for collective exploration and paving the way towards novel parallelization strategies.