🤖 AI Summary
This work addresses the significant performance gap between general-purpose large language models (LLMs) and even novice human players in specialized domains such as Go, primarily attributed to the absence of effective mechanisms for integrating expert knowledge with general reasoning capabilities. To bridge this gap, the authors propose a novel paradigm that combines structured Go knowledge and chain-of-thought (CoT) reasoning data through hybrid fine-tuning, augmented with reinforcement learning. This approach enables an LLM—named LoGos—to achieve professional-level gameplay entirely within a natural language environment for the first time. LoGos substantially outperforms existing LLMs in both strategic reasoning and move prediction. The study also introduces the first large-scale Go-specific LLM training dataset and evaluation benchmark, facilitating future research in domain-specialized language models.
📝 Abstract
Large language models (LLMs) have demonstrated exceptional performance in reasoning tasks such as mathematics and coding, matching or surpassing human capabilities. However, these impressive reasoning abilities face significant challenges in specialized domains. Taking Go as an example, although AlphaGo has established the high performance ceiling of AI systems in Go, mainstream LLMs still struggle to reach even beginner-level proficiency, let alone perform natural language reasoning. This performance gap between general-purpose LLMs and domain experts is significantly limiting the application of LLMs on a wider range of domain-specific tasks. In this work, we aim to bridge the divide between LLMs'general reasoning capabilities and expert knowledge in domain-specific tasks. We perform mixed fine-tuning with structured Go expertise and general long Chain-of-Thought (CoT) reasoning data as a cold start, followed by reinforcement learning to integrate expert knowledge in Go with general reasoning capabilities. Through this methodology, we present \textbf{LoGos}, a powerful LLM that not only maintains outstanding general reasoning abilities, but also conducts Go gameplay in natural language, demonstrating effective strategic reasoning and accurate next-move prediction. LoGos achieves performance comparable to human professional players, substantially surpassing all existing LLMs. Through this work, we aim to contribute insights on applying general LLM reasoning capabilities to specialized domains. We will release the first large-scale Go dataset for LLM training, the first LLM Go evaluation benchmark, and the first general LLM that reaches human professional-level performance in Go at: https://github.com/Entarochuan/LoGos.