GCHR : Goal-Conditioned Hindsight Regularization for Sample-Efficient Reinforcement Learning

📅 2025-08-08

📈 Citations: 0

✨ Influential: 0

career value

216K/year

🤖 AI Summary

To address the low sample efficiency of goal-conditioned reinforcement learning (GCRL) under sparse rewards, this paper proposes Hindsight Goal-conditioned Regularization (HGR), a novel hindsight experience utilization mechanism integrating goal generation and action regularization. Unlike conventional Hindsight Experience Replay (HER), which only relabels achieved goals, HGR generates semantically plausible goals and enforces action consistency constraints to improve the reasonableness and generalizability of goal-action pairs. Additionally, it introduces hindsight self-imitation regularization to enhance policy reuse of successful trajectories. By embedding these components within the HER framework, HGR achieves more efficient experience replay and policy optimization. Experimental results demonstrate that HGR significantly improves sample efficiency across diverse navigation and manipulation tasks, consistently outperforming state-of-the-art GCRL baselines.

Technology Category

Application Category

📝 Abstract

Goal-conditioned reinforcement learning (GCRL) with sparse rewards remains a fundamental challenge in reinforcement learning. While hindsight experience replay (HER) has shown promise by relabeling collected trajectories with achieved goals, we argue that trajectory relabeling alone does not fully exploit the available experiences in off-policy GCRL methods, resulting in limited sample efficiency. In this paper, we propose Hindsight Goal-conditioned Regularization (HGR), a technique that generates action regularization priors based on hindsight goals. When combined with hindsight self-imitation regularization (HSR), our approach enables off-policy RL algorithms to maximize experience utilization. Compared to existing GCRL methods that employ HER and self-imitation techniques, our hindsight regularizations achieve substantially more efficient sample reuse and the best performances, which we empirically demonstrate on a suite of navigation and manipulation tasks.

Problem

Research questions and friction points this paper is trying to address.

Addresses sparse rewards in goal-conditioned reinforcement learning

Improves sample efficiency in off-policy GCRL methods

Maximizes experience utilization through hindsight regularization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hindsight Goal-conditioned Regularization for action priors

Combines with hindsight self-imitation regularization techniques

Maximizes experience utilization in off-policy algorithms

🔎 Similar Papers

On the Sample Efficiency of Abstractions and Potential-Based Reward Shaping in Reinforcement Learning