A* Search Without Expansions: Learning Heuristic Functions with Deep Q-Networks

📅 2021-02-08
🏛️ arXiv.org
📈 Citations: 19
Influential: 3
📄 PDF
🤖 AI Summary
To address the computational bottleneck in A* search—where node generation and heuristic evaluation scale linearly with action space size—this paper proposes Q* search: the first integration of a deep Q-network (DQN) into an optimal pathfinding framework. Q* replaces explicit child-node expansion by directly predicting, via a single forward pass, the sum of cost-to-come and heuristic estimate for all candidate actions, thereby drastically reducing computational and memory overhead. We theoretically prove that Q* retains solution optimality under weak admissibility conditions. Evaluated on a Rubik’s Cube meta-action modeling task with an 1872-dimensional action space, Q* incurs less than 4× runtime overhead and generates fewer than 3× nodes compared to A*, achieving up to 129× speedup and 1288× reduction in node expansions. The core contribution lies in repurposing DQN from policy learning to serving as the evaluation kernel in heuristic search, enabling efficient optimal planning in massive action spaces.
📝 Abstract
Efficiently solving problems with large action spaces using A* search has been of importance to the artificial intelligence community for decades. This is because the computation and memory requirements of A* search grow linearly with the size of the action space. This burden becomes even more apparent when A* search uses a heuristic function learned by computationally expensive function approximators, such as deep neural networks. To address this problem, we introduce Q* search, a search algorithm that uses deep Q-networks to guide search in order to take advantage of the fact that the sum of the transition costs and heuristic values of the children of a node can be computed with a single forward pass through a deep Q-network without explicitly generating those children. This significantly reduces computation time and requires only one node to be generated per iteration. We use Q* search to solve the Rubik's cube when formulated with a large action space that includes 1872 meta-actions and find that this 157-fold increase in the size of the action space incurs less than a 4-fold increase in computation time and less than a 3-fold increase in number of nodes generated when performing Q* search. Furthermore, Q* search is up to 129 times faster and generates up to 1288 times fewer nodes than A* search. Finally, although obtaining admissible heuristic functions from deep neural networks is an ongoing area of research, we prove that Q* search is guaranteed to find a shortest path given a heuristic function that neither overestimates the cost of a shortest path nor underestimates the transition cost.
Problem

Research questions and friction points this paper is trying to address.

Reducing computational burden in A* search with large action spaces
Learning heuristic functions without generating successor states
Developing efficient pathfinding using deep Q-networks for cost estimation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses deep Q-networks to learn heuristic functions
Estimates Q-values without generating successor states
Reduces computation time and memory usage significantly
🔎 Similar Papers
No similar papers found.
Forest Agostinelli
Forest Agostinelli
Assistant Professor at the University of South Carolina
Artificial IntelligenceDeep LearningReinforcement LearningHeuristic SearchLogic
A
A. Shmakov
Department of Computer Science, University of California, Irvine
S
S. McAleer
Department of Computer Science, University of California, Irvine
Roy Fox
Roy Fox
Assistant Professor, UC Irvine
Reinforcement LearningAlgorithmic Game TheoryInformation TheoryRobot learning
P
P. Baldi
Department of Computer Science, University of California, Irvine