Mixture of Masters: Sparse Chess Language Models with Player Routing

📅 2026-02-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitation of existing chess language models, which employ a single dense architecture and consequently fail to preserve the distinctive strategies and rare yet effective moves of grandmasters, leading to homogenized playing styles. To overcome this, we propose the first chess language model based on a sparse mixture-of-experts (MoE) architecture, featuring a learnable state-aware gating network that dynamically routes inputs to specialized small GPT experts, each trained to emulate the style of a specific grandmaster. The approach integrates self-supervised pretraining with reinforcement learning guided by chess-specific rewards, enabling style-adaptive move generation. Experimental results demonstrate that our model outperforms Stockfish, single-expert baselines, and conventionally aggregated GPT models in standard gameplay, while significantly enhancing move diversity, controllability, and interpretability.

Technology Category

Application Category

📝 Abstract
Modern chess language models are dense transformers trained on millions of games played by thousands of high-rated individuals. However, these monolithic networks tend to collapse into mode-averaged behavior, where stylistic boundaries are blurred, and rare but effective strategies are suppressed. To counteract homogenization, we introduce Mixture-of-Masters (MoM), the first chess mixture-of-experts model with small-sized GPT experts emulating world-class grandmasters. Each expert is trained with a combination of self-supervised learning and reinforcement learning guided by chess-specific rewards. For each move, a post-hoc learnable gating network selects the most appropriate persona to channel depending on the game state, allowing MoM to switch its style dynamically$--$e.g., Tal's offensive vocation or Petrosian's defensive solidity. When evaluated against Stockfish on unseen standard games, MoM outperforms both dense individual expert networks and popular GPT baselines trained on aggregated data, while ensuring generation variety, control, and interpretability.
Problem

Research questions and friction points this paper is trying to address.

chess language models
mode averaging
style homogenization
rare strategies suppression
player-specific behavior
Innovation

Methods, ideas, or system contributions that make the work stand out.

Mixture-of-Experts
Chess Language Models
Player Routing
Reinforcement Learning
Style Emulation
🔎 Similar Papers
No similar papers found.
G
Giacomo Frisoni
Department of Computer Science and Engineering, University of Bologna
L
Lorenzo Molfetta
Department of Computer Science and Engineering, University of Bologna
D
Davide Freddi
Department of Computer Science and Engineering, University of Bologna
Gianluca Moro
Gianluca Moro
Dept. of Computer Science and Engineering - University of Bologna, Cesena
natural language processingdata sciencedata miningmachine learningsensor networks agents peer-to-peer systems