Reachability Weighted Offline Goal-conditioned Resampling

📅 2025-06-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In offline goal-conditioned reinforcement learning, fixed datasets lack explicit goal annotations, causing uniform sampling to introduce numerous unreachable state-goal-action tuples that severely impair policy generalization. To address this, we propose a reachability-aware weighted resampling mechanism: (i) a PU-learning-based (Positive-Unlabeled) reachability classifier is designed to dynamically model goal reachability without requiring expert annotations; (ii) this classifier is tightly coupled with goal-conditioned Q-value estimation and seamlessly integrated into mainstream offline RL frameworks (e.g., CQL, BCQ). The method is modular and plug-and-play. We evaluate it on six robotic manipulation tasks, demonstrating consistent improvements—particularly a nearly 50% performance gain on the HandBlock-Z task over baselines—while effectively suppressing interference from unreachable samples.

Technology Category

Application Category

📝 Abstract
Offline goal-conditioned reinforcement learning (RL) relies on fixed datasets where many potential goals share the same state and action spaces. However, these potential goals are not explicitly represented in the collected trajectories. To learn a generalizable goal-conditioned policy, it is common to sample goals and state-action pairs uniformly using dynamic programming methods such as Q-learning. Uniform sampling, however, requires an intractably large dataset to cover all possible combinations and creates many unreachable state-goal-action pairs that degrade policy performance. Our key insight is that sampling should favor transitions that enable goal achievement. To this end, we propose Reachability Weighted Sampling (RWS). RWS uses a reachability classifier trained via positive-unlabeled (PU) learning on goal-conditioned state-action values. The classifier maps these values to a reachability score, which is then used as a sampling priority. RWS is a plug-and-play module that integrates seamlessly with standard offline RL algorithms. Experiments on six complex simulated robotic manipulation tasks, including those with a robot arm and a dexterous hand, show that RWS significantly improves performance. In one notable case, performance on the HandBlock-Z task improved by nearly 50 percent relative to the baseline. These results indicate the effectiveness of reachability-weighted sampling.
Problem

Research questions and friction points this paper is trying to address.

Improves goal-conditioned RL by prioritizing reachable transitions
Addresses dataset limitations in offline goal-conditioned reinforcement learning
Enhances policy performance using reachability-weighted sampling (RWS)
Innovation

Methods, ideas, or system contributions that make the work stand out.

Reachability Weighted Sampling prioritizes goal-achieving transitions
Uses reachability classifier with positive-unlabeled learning
Plug-and-play module enhancing offline RL algorithms
🔎 Similar Papers
No similar papers found.
Wenyan Yang
Wenyan Yang
Aalto University
Computer VisionImitation LearningReinforcement Learning
J
J. Pajarinen
Department of Electrical Engineering and Automation, Aalto University