Dense and Diverse Goal Coverage in Multi Goal Reinforcement Learning

📅 2025-10-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of sparse policy coverage and uneven access to multiple goal states in multi-objective reinforcement learning. We propose a novel algorithm that does not require a pre-specified goal distribution. Our method jointly optimizes expected return and the uniformity of the marginal distribution over goal states, achieved by dynamically constructing an adaptive reward function and iteratively updating a mixture of policies—integrating ideas from offline RL and entropy regularization. We provide theoretical convergence guarantees for both return maximization and goal-state distribution dispersion. Experiments on synthetic MDPs and standard benchmark environments demonstrate that our algorithm significantly improves goal-state coverage density and diversity while maintaining high task performance. This establishes a new paradigm for robust policy learning in sparse-goal settings.

Technology Category

Application Category

📝 Abstract
Reinforcement Learning algorithms are primarily focused on learning a policy that maximizes expected return. As a result, the learned policy can exploit one or few reward sources. However, in many natural situations, it is desirable to learn a policy that induces a dispersed marginal state distribution over rewarding states, while maximizing the expected return which is typically tied to reaching a goal state. This aspect remains relatively unexplored. Existing techniques based on entropy regularization and intrinsic rewards use stochasticity for encouraging exploration to find an optimal policy which may not necessarily lead to dispersed marginal state distribution over rewarding states. Other RL algorithms which match a target distribution assume the latter to be available apriori. This may be infeasible in large scale systems where enumeration of all states is not possible and a state is determined to be a goal state only upon reaching it. We formalize the problem of maximizing the expected return while uniformly visiting the goal states as Multi Goal RL in which an oracle classifier over the state space determines the goal states. We propose a novel algorithm that learns a high-return policy mixture with marginal state distribution dispersed over the set of goal states. Our algorithm is based on optimizing a custom RL reward which is computed - based on the current policy mixture - at each iteration for a set of sampled trajectories. The latter are used via an offline RL algorithm to update the policy mixture. We prove performance guarantees for our algorithm, showing efficient convergence bounds for optimizing a natural objective which captures the expected return as well as the dispersion of the marginal state distribution over the goal states. We design and perform experiments on synthetic MDPs and standard RL environments to evaluate the effectiveness of our algorithm.
Problem

Research questions and friction points this paper is trying to address.

Maximizing expected return while uniformly visiting goal states
Learning dispersed marginal state distribution over rewarding states
Addressing sparse goal coverage in multi-goal reinforcement learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Optimizes custom reward for policy mixture dispersion
Uses offline RL with sampled trajectory updates
Ensures uniform goal coverage while maximizing return
🔎 Similar Papers
No similar papers found.