Accelerating Goal-Conditioned RL Algorithms and Research

📅 2024-08-20
🏛️ arXiv.org
📈 Citations: 2
Influential: 0
📄 PDF
🤖 AI Summary
Self-supervised goal-conditioned reinforcement learning (GCRL) has long suffered from slow simulation data acquisition and unstable training, hindering its broad adoption. This paper introduces JaxGCRL: the first high-performance JAX library and benchmark suite specifically designed for GCRL. It integrates GPU-accelerated replay buffers, parallel vectorized environments, and a stable contrastive RL framework. We systematically evaluate key design choices—including contrastive learning, vector-quantized goal representations, and self-supervised goal sampling—under unified experimental conditions. Empirical results demonstrate up to 22× speedup over prior implementations, enabling million-step training within minutes on a single GPU. JaxGCRL facilitates rapid iteration and reproducible evaluation across diverse, challenging environments. By significantly lowering the barrier to GCRL research, it provides empirically grounded guidance for algorithmic development and fosters standardized, scalable experimentation.

Technology Category

Application Category

📝 Abstract
Self-supervision has the potential to transform reinforcement learning (RL), paralleling the breakthroughs it has enabled in other areas of machine learning. While self-supervised learning in other domains aims to find patterns in a fixed dataset, self-supervised goal-conditioned reinforcement learning (GCRL) agents discover new behaviors by learning from the goals achieved during unstructured interaction with the environment. However, these methods have failed to see similar success, both due to a lack of data from slow environment simulations as well as a lack of stable algorithms. We take a step toward addressing both of these issues by releasing a high-performance codebase and benchmark (JaxGCRL) for self-supervised GCRL, enabling researchers to train agents for millions of environment steps in minutes on a single GPU. By utilizing GPU-accelerated replay buffers, environments, and a stable contrastive RL algorithm, we reduce training time by up to $22 imes$. Additionally, we assess key design choices in contrastive RL, identifying those that most effectively stabilize and enhance training performance. With this approach, we provide a foundation for future research in self-supervised GCRL, enabling researchers to quickly iterate on new ideas and evaluate them in diverse and challenging environments. Website + Code: https://github.com/MichalBortkiewicz/JaxGCRL
Problem

Research questions and friction points this paper is trying to address.

Accelerating self-supervised goal-conditioned RL training
Addressing data scarcity in slow environment simulations
Stabilizing algorithms for contrastive RL performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

GPU-accelerated replay buffers speed training
Stable contrastive RL algorithm enhances performance
High-performance JaxGCRL codebase enables rapid research
🔎 Similar Papers
No similar papers found.
M
Michal Bortkiewicz
Warsaw University of Technology
W
Wladek Palucki
University of Warsaw
Vivek Myers
Vivek Myers
UC Berkeley
Reinforcement LearningRoboticsHuman-AI CollaborationArtificial Intelligence
Tadeusz Dziarmaga
Tadeusz Dziarmaga
Uniwersytet Jagielloński
T
Tomasz Arczewski
Jagiellonian University
L
Lukasz Kuci'nski
University of Warsaw, Polish Academy of Sciences, IDEAS NCBR
Benjamin Eysenbach
Benjamin Eysenbach
Princeton University
Reinforcement Learning