Graph-Assisted Stitching for Offline Hierarchical Reinforcement Learning

📅 2025-06-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the inefficiency in subgoal sequence generation and the difficulty of stitching state transitions across trajectories in offline hierarchical reinforcement learning (HRL) for long-horizon tasks, this work departs from conventional high-level policy learning paradigms and, for the first time, formulates subgoal selection as a graph search problem. We propose Temporal Distance Representation (TDR) to construct a state transition graph and introduce a Temporal Efficiency (TE) metric to prune spurious edges, enabling semantically coherent state clustering and shortest-path subgoal planning. Our method integrates spectral clustering, Dijkstra’s algorithm, and graph neural network (GNN)-based representation learning, while coupling low-level policies inspired by BCQ and BEAR. Evaluated on locomotion, navigation, and manipulation benchmarks, our approach consistently outperforms existing offline HRL methods, achieving a score of 88.3 on critical trajectory-stitching tasks—over 87× higher than the prior SOTA (1.0).

Technology Category

Application Category

📝 Abstract
Existing offline hierarchical reinforcement learning methods rely on high-level policy learning to generate subgoal sequences. However, their efficiency degrades as task horizons increase, and they lack effective strategies for stitching useful state transitions across different trajectories. We propose Graph-Assisted Stitching (GAS), a novel framework that formulates subgoal selection as a graph search problem rather than learning an explicit high-level policy. By embedding states into a Temporal Distance Representation (TDR) space, GAS clusters semantically similar states from different trajectories into unified graph nodes, enabling efficient transition stitching. A shortest-path algorithm is then applied to select subgoal sequences within the graph, while a low-level policy learns to reach the subgoals. To improve graph quality, we introduce the Temporal Efficiency (TE) metric, which filters out noisy or inefficient transition states, significantly enhancing task performance. GAS outperforms prior offline HRL methods across locomotion, navigation, and manipulation tasks. Notably, in the most stitching-critical task, it achieves a score of 88.3, dramatically surpassing the previous state-of-the-art score of 1.0. Our source code is available at: https://github.com/qortmdgh4141/GAS.
Problem

Research questions and friction points this paper is trying to address.

Improves efficiency in long-horizon offline hierarchical reinforcement learning
Enhances stitching of useful state transitions across trajectories
Replaces high-level policy learning with graph-based subgoal selection
Innovation

Methods, ideas, or system contributions that make the work stand out.

Formulates subgoal selection as graph search
Uses Temporal Distance Representation for clustering
Introduces Temporal Efficiency metric for graph quality
🔎 Similar Papers
No similar papers found.
S
Seungho Baek
Department of Computer Science and Engineering, Sungkyunkwan University, Suwon, Republic of Korea
T
Taegeon Park
Department of Artificial Intelligence, Sungkyunkwan University, Suwon, Republic of Korea
Jongchan Park
Jongchan Park
Lunit Inc.
Seungjun Oh
Seungjun Oh
Sungkyunkwan University
Deep LearningComputer Vision
Y
Yusung Kim
Department of Computer Science and Engineering, Sungkyunkwan University, Suwon, Republic of Korea; Department of Artificial Intelligence, Sungkyunkwan University, Suwon, Republic of Korea