Beyond Single-Step Updates: Reinforcement Learning of Heuristics with Limited-Horizon Search

📅 2025-11-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Traditional heuristic methods in reinforcement learning for shortest-path problems rely on single-step Bellman updates, leading to localized and inconsistent state-value estimates. To address this, we propose a multi-step heuristic learning framework that integrates finite-horizon graph search with deep approximate value iteration. Our method anchors computation at the search frontier and propagates path information backward via bounded-depth search, enabling multi-step, globally consistent value correction. A neural network is trained end-to-end to approximate the heuristic function. Evaluated on diverse pathfinding benchmarks, our approach significantly improves both search efficiency and solution quality: average node expansions decrease by 37%, and optimal solution rates increase by 22% compared to single-step update baselines. These results validate the effectiveness and generalizability of multi-step search-guided heuristic learning.

Technology Category

Application Category

📝 Abstract
Many sequential decision-making problems can be formulated as shortest-path problems, where the objective is to reach a goal state from a given starting state. Heuristic search is a standard approach for solving such problems, relying on a heuristic function to estimate the cost to the goal from any given state. Recent approaches leverage reinforcement learning to learn heuristics by applying deep approximate value iteration. These methods typically rely on single-step Bellman updates, where the heuristic of a state is updated based on its best neighbor and the corresponding edge cost. This work proposes a generalized approach that enhances both state sampling and heuristic updates by performing limited-horizon searches and updating each state's heuristic based on the shortest path to the search frontier, incorporating both edge costs and the heuristic values of frontier states.
Problem

Research questions and friction points this paper is trying to address.

Learning heuristics for sequential decision-making via reinforcement learning
Enhancing heuristic updates using limited-horizon search methods
Improving state sampling and heuristic accuracy in shortest-path problems
Innovation

Methods, ideas, or system contributions that make the work stand out.

Limited-horizon search for state sampling
Multi-step heuristic updates via shortest paths
Combining edge costs with frontier heuristics
🔎 Similar Papers
No similar papers found.
G
Gal Hadar
Faculty of Computer and Information Science, Ben-Gurion University of the Negev
Forest Agostinelli
Forest Agostinelli
Assistant Professor at the University of South Carolina
Artificial IntelligenceDeep LearningReinforcement LearningHeuristic SearchLogic
S
Shahaf S. Shperberg
Faculty of Computer and Information Science, Ben-Gurion University of the Negev