🤖 AI Summary
Traditional CFR-based algorithms face exponential computational complexity growth and fail to guarantee non-negative payoffs under Nash equilibrium in three-or-more-player poker, rendering them impractical for mainstream tournament formats such as Spin & Go. To address this, we propose SpinGPT—the first large language model (LLM) approach specifically designed for three-player imperfect-information poker. SpinGPT employs a novel two-stage training paradigm: (i) supervised fine-tuning on 320,000 high-quality human decision trajectories, followed by (ii) reinforcement learning optimization using 270,000 solver-generated expert demonstrations. Experimental results show that SpinGPT matches the solver’s action in 78% of decisions and achieves a statistically significant win rate of 13.4 ± 12.9 BB/100 (95% CI) against the strong baseline Slumbot. This work breaks long-standing theoretical and practical bottlenecks of classical algorithms in multi-player poker, establishing a new paradigm for LLM-based strategic reasoning in imperfect-information games.
📝 Abstract
The Counterfactual Regret Minimization (CFR) algorithm and its variants have enabled the development of pokerbots capable of beating the best human players in heads-up (1v1) cash games and competing with them in six-player formats. However, CFR's computational complexity rises exponentially with the number of players. Furthermore, in games with three or more players, following Nash equilibrium no longer guarantees a non-losing outcome. These limitations, along with others, significantly restrict the applicability of CFR to the most popular formats: tournaments. Motivated by the recent success of Large Language Models (LLM) in chess and Diplomacy, we present SpinGPT, the first LLM tailored to Spin & Go, a popular three-player online poker format. SpinGPT is trained in two stages: (1) Supervised Fine-Tuning on 320k high-stakes expert decisions; (2) Reinforcement Learning on 270k solver-generated hands. Our results show that SpinGPT matches the solver's actions in 78% of decisions (tolerant accuracy). With a simple deep-stack heuristic, it achieves 13.4 +/- 12.9 BB/100 versus Slumbot in heads-up over 30,000 hands (95% CI). These results suggest that LLMs could be a new way to deal with multi-player imperfect-information games like poker.