Graph-Assisted Stitching for Offline Hierarchical Reinforcement Learning

📅 2025-06-09

📈 Citations: 0

✨ Influential: 0

career value

235K/year

🤖 AI Summary

To address the inefficiency in subgoal sequence generation and the difficulty of stitching state transitions across trajectories in offline hierarchical reinforcement learning (HRL) for long-horizon tasks, this work departs from conventional high-level policy learning paradigms and, for the first time, formulates subgoal selection as a graph search problem. We propose Temporal Distance Representation (TDR) to construct a state transition graph and introduce a Temporal Efficiency (TE) metric to prune spurious edges, enabling semantically coherent state clustering and shortest-path subgoal planning. Our method integrates spectral clustering, Dijkstra’s algorithm, and graph neural network (GNN)-based representation learning, while coupling low-level policies inspired by BCQ and BEAR. Evaluated on locomotion, navigation, and manipulation benchmarks, our approach consistently outperforms existing offline HRL methods, achieving a score of 88.3 on critical trajectory-stitching tasks—over 87× higher than the prior SOTA (1.0).

Technology Category

Application Category

📝 Abstract

Existing offline hierarchical reinforcement learning methods rely on high-level policy learning to generate subgoal sequences. However, their efficiency degrades as task horizons increase, and they lack effective strategies for stitching useful state transitions across different trajectories. We propose Graph-Assisted Stitching (GAS), a novel framework that formulates subgoal selection as a graph search problem rather than learning an explicit high-level policy. By embedding states into a Temporal Distance Representation (TDR) space, GAS clusters semantically similar states from different trajectories into unified graph nodes, enabling efficient transition stitching. A shortest-path algorithm is then applied to select subgoal sequences within the graph, while a low-level policy learns to reach the subgoals. To improve graph quality, we introduce the Temporal Efficiency (TE) metric, which filters out noisy or inefficient transition states, significantly enhancing task performance. GAS outperforms prior offline HRL methods across locomotion, navigation, and manipulation tasks. Notably, in the most stitching-critical task, it achieves a score of 88.3, dramatically surpassing the previous state-of-the-art score of 1.0. Our source code is available at: https://github.com/qortmdgh4141/GAS.

Problem

Research questions and friction points this paper is trying to address.

Improves efficiency in long-horizon offline hierarchical reinforcement learning

Enhances stitching of useful state transitions across trajectories

Replaces high-level policy learning with graph-based subgoal selection

Innovation

Methods, ideas, or system contributions that make the work stand out.

Formulates subgoal selection as graph search

Uses Temporal Distance Representation for clustering

Introduces Temporal Efficiency metric for graph quality

🔎 Similar Papers

Offline Hierarchical Reinforcement Learning via Inverse Optimization