Test-Time Graph Search for Goal-Conditioned Reinforcement Learning

📅 2025-10-08

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Offline goal-conditioned reinforcement learning (GCRL) suffers from poor performance on long-horizon tasks due to challenges in temporal credit assignment and error accumulation. To address this, we propose a test-time graph search method that requires no modification to training or additional supervision: it constructs a weighted state-space graph using only a pretrained goal-conditioned value function, performs subgoal sequence planning (e.g., shortest-path search) over this graph, and executes the plan segment-wise with a frozen policy. This is the first work to directly apply metric-guided graph search to the inference phase of offline GCRL—offering both lightweight implementation and broad generality, compatible with arbitrary distance or cost signals. Evaluated on the OGBench benchmark, our approach consistently improves success rates across multiple foundational offline GCRL algorithms, demonstrating the effectiveness and general utility of test-time planning for offline long-horizon decision-making.

Technology Category

Application Category

📝 Abstract

Offline goal-conditioned reinforcement learning (GCRL) trains policies that reach user-specified goals at test time, providing a simple, unsupervised, domain-agnostic way to extract diverse behaviors from unlabeled, reward-free datasets. Nonetheless, long-horizon decision making remains difficult for GCRL agents due to temporal credit assignment and error accumulation, and the offline setting amplifies these effects. To alleviate this issue, we introduce Test-Time Graph Search (TTGS), a lightweight planning approach to solve the GCRL task. TTGS accepts any state-space distance or cost signal, builds a weighted graph over dataset states, and performs fast search to assemble a sequence of subgoals that a frozen policy executes. When the base learner is value-based, the distance is derived directly from the learned goal-conditioned value function, so no handcrafted metric is needed. TTGS requires no changes to training, no additional supervision, no online interaction, and no privileged information, and it runs entirely at inference. On the OGBench benchmark, TTGS improves success rates of multiple base learners on challenging locomotion tasks, demonstrating the benefit of simple metric-guided test-time planning for offline GCRL.

Problem

Research questions and friction points this paper is trying to address.

Addresses long-horizon decision making challenges in offline goal-conditioned reinforcement learning

Solves temporal credit assignment and error accumulation issues in GCRL agents

Improves success rates on locomotion tasks through test-time graph search planning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Test-time graph search for goal-conditioned reinforcement learning

Builds weighted graph over dataset states for subgoal planning

Uses learned value function as distance metric without supervision

🔎 Similar Papers

No similar papers found.

Authors to Follow