Adaptable Hindsight Experience Replay for Search-Based Learning

📅 2025-11-05

📈 Citations: 0

✨ Influential: 0

career value

216K/year

🤖 AI Summary

AlphaZero-style Monte Carlo Tree Search (MCTS) algorithms suffer from inefficient training in sparse-reward environments—especially during early learning stages—due to insufficient guidance from the policy network. Method: This paper integrates Hindsight Experience Replay (HER) flexibly into the AlphaZero framework, proposing an adjustable HER mechanism that enables dynamic configuration of goal relabeling, policy targets, and trajectory selection, thereby overcoming traditional HER’s adaptability limitations in search-guided learning. The approach unifies MCTS, neural network guidance, supervised learning, and reinforcement learning, augmented by an improved replay strategy to enhance the model’s capacity to learn from sparse feedback. Contribution/Results: Experiments on symbolic regression (equation discovery) demonstrate that the proposed method significantly outperforms both pure supervised and pure reinforcement learning baselines, validating its effectiveness and cross-task generalization capability.

Technology Category

Application Category

📝 Abstract

AlphaZero-like Monte Carlo Tree Search systems, originally introduced for two-player games, dynamically balance exploration and exploitation using neural network guidance. This combination makes them also suitable for classical search problems. However, the original method of training the network with simulation results is limited in sparse reward settings, especially in the early stages, where the network cannot yet give guidance. Hindsight Experience Replay (HER) addresses this issue by relabeling unsuccessful trajectories from the search tree as supervised learning signals. We introduce Adaptable HER (ours{}), a flexible framework that integrates HER with AlphaZero, allowing easy adjustments to HER properties such as relabeled goals, policy targets, and trajectory selection. Our experiments, including equation discovery, show that the possibility of modifying HER is beneficial and surpasses the performance of pure supervised or reinforcement learning.

Problem

Research questions and friction points this paper is trying to address.

Improving neural network training in sparse reward search problems

Enhancing exploration in AlphaZero systems during early learning stages

Flexibly integrating hindsight experience replay with Monte Carlo Tree Search

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates Hindsight Experience Replay with AlphaZero

Enables flexible adjustments to HER properties

Improves sparse reward learning via trajectory relabeling

🔎 Similar Papers

No similar papers found.