🤖 AI Summary
This study addresses the challenge of applying deep learning to complex game-theoretic decision-making under resource constraints. The authors propose a lightweight hybrid framework that, for the first time, integrates a large language model (GPT-4o-mini) with a graph attention autoencoder, enhanced by multi-step Monte Carlo tree search and a stochastic graph genetic algorithm. Operating without expert demonstrations, the method leverages synthetic data generated by the LLM for noise-robust supervised learning, wherein the graph structure effectively denoises LLM outputs to enable policy generalization from weak to strong play. Evaluated on the 10×10 Amazons board, the approach achieves a 66.5% win rate against the teacher model using only 50 inference nodes, outperforming baseline methods by 15%–56% in decision accuracy.
📝 Abstract
Artificial intelligence has advanced significantly through the development of intelligent game-playing systems, providing rigorous testbeds for decision-making, strategic planning, and adaptive learning. However, resource-constrained environments pose critical challenges, as conventional deep learning methods heavily rely on extensive datasets and computational resources. In this paper, we propose a lightweight hybrid framework for the Game of the Amazons, which explores the paradigm of weak-to-strong generalization by integrating the structural reasoning of graph-based learning with the generative capabilities of large language models. Specifically, we leverage a Graph Attention Autoencoder to inform a multi-step Monte Carlo Tree Search, utilize a Stochastic Graph Genetic Algorithm to optimize evaluation signals, and harness GPT-4o-mini to generate synthetic training data. Unlike traditional approaches that rely on expert demonstrations, our framework learns from noisy and imperfect supervision. We demonstrate that the Graph Attention mechanism effectively functions as a structural filter, denoising the LLM's outputs. Experiments on a 10$\times$10 Amazons board show that our hybrid approach not only achieves a 15\%--56\% improvement in decision accuracy over baselines but also significantly outperforms its teacher model (GPT-4o-mini), achieving a competitive win rate of 45.0\% at N=30 nodes and a decisive 66.5\% at only N=50 nodes. These results verify the feasibility of evolving specialized, high-performance game AI from general-purpose foundation models under stringent computational constraints.