Goal-Conditioned Agents that Learn Everything All at Once

📅 2026-05-22

📈 Citations: 0

✨ Influential: 0

career value

175K/year

🤖 AI Summary

This work addresses the inefficiency of traditional goal-conditioned reinforcement learning, which discards abundant environmental information by updating only with respect to a single specified goal, and the limited scalability of full-goal learning due to its high computational cost. The authors propose LEO, a method that enables scalable off-policy updates for all goals simultaneously by outputting value estimates and actions for every goal in a single forward pass. To further reduce computational overhead and enhance performance, they introduce a teacher–student framework that distills knowledge from the LEO teacher network into a lightweight student policy. Experiments demonstrate that LEO substantially outperforms existing approaches in the Craftax environment and achieves state-of-the-art results on continuous control tasks, while accelerating full-goal relabeling by over 250× compared to conventional methods.

📝 Abstract

A goal-conditioned reinforcement learning agent exploring an environment will see a wealth of information throughout a trajectory, most of which is discarded when only performing on-policy updates with respect to the commanded goal. All-goals learning, where each transition is used for learning off-policy with respect to every goal, allows agents to extract maximal information, however it is usually computationally infeasible when done via naive relabelling. This can be overcome by jointly outputting values and actions for every goal at once, allowing for efficient, parallel all-goals updates with a single pass through the network, in a process we call Learning Everything all at Once (LEO). We show that this approach significantly outperforms other methods on goal-conditioned Craftax and is competitive with existing baselines on continuous control environments, while achieving a >250x speed-up compared to all-goals relabelling. We then go on to show that this approach can be made even more powerful by using LEO as a teacher network, rather than a direct actor. We hope that, by unlocking all-goals learning at scale, LEO can serve as a useful tool for RL practitioners in complex environments. We open source our code.

Problem

Research questions and friction points this paper is trying to address.

goal-conditioned reinforcement learning

all-goals learning

off-policy learning

sample efficiency

computational feasibility

Innovation

Methods, ideas, or system contributions that make the work stand out.

goal-conditioned reinforcement learning

all-goals learning

efficient off-policy updates