K-Myriad: Jump-starting reinforcement learning with unsupervised parallel agents

📅 2026-01-26

📈 Citations: 0

✨ Influential: 0

career value

242K/year

🤖 AI Summary

This work proposes K-Myriad, a novel framework that addresses the limitations of conventional parallel reinforcement learning methods, which typically accelerate only a single policy and struggle to balance exploration diversity with efficiency. K-Myriad introduces, for the first time, population-level state entropy maximization into unsupervised parallel exploration by co-training a cohort of heterogeneous agents. This approach automatically generates a diverse set of high-quality policies that serve as effective initializations for subsequent reinforcement learning. By integrating unsupervised learning, multi-agent collaboration, and state entropy optimization, the method substantially enhances both policy diversity and sample efficiency in high-dimensional continuous control tasks.

Technology Category

Application Category

📝 Abstract

Parallelization in Reinforcement Learning is typically employed to speed up the training of a single policy, where multiple workers collect experience from an identical sampling distribution. This common design limits the potential of parallelization by neglecting the advantages of diverse exploration strategies. We propose K-Myriad, a scalable and unsupervised method that maximizes the collective state entropy induced by a population of parallel policies. By cultivating a portfolio of specialized exploration strategies, K-Myriad provides a robust initialization for Reinforcement Learning, leading to both higher training efficiency and the discovery of heterogeneous solutions. Experiments on high-dimensional continuous control tasks, with large-scale parallelization, demonstrate that K-Myriad can learn a broad set of distinct policies, highlighting its effectiveness for collective exploration and paving the way towards novel parallelization strategies.

Problem

Research questions and friction points this paper is trying to address.

Reinforcement Learning

Parallelization

Exploration Strategies

State Entropy

Diverse Policies

Innovation

Methods, ideas, or system contributions that make the work stand out.

K-Myriad

unsupervised parallel agents

collective exploration