🤖 AI Summary
This study systematically evaluates large language models’ (LLMs) capabilities in solving and generating linguistics olympiad puzzles targeted at high school students. Addressing canonical puzzle types—including phonology, morphology, and syntax—we employ state-of-the-art models (e.g., OpenAI o1) with structured prompt engineering and multi-dimensional performance analysis. Our work introduces the first automated, high-quality generation pipeline for linguistics puzzles, overcoming limitations of existing evaluation benchmarks. Experimental results demonstrate that LLMs outperform the average human contestant across most tasks—particularly in phonological and morphological reasoning. Moreover, the study validates LLMs’ capacity to model linguistic knowledge of rare and low-resource languages. Beyond benchmarking, it pioneers an AI-driven paradigm for linguistics outreach and endangered-language knowledge dissemination. The research delivers a reproducible methodological framework and empirical foundation for computational linguistics education, bridging theoretical linguistics, NLP, and pedagogy.
📝 Abstract
In this paper, we introduce a combination of novel and exciting tasks: the solution and generation of linguistic puzzles. We focus on puzzles used in Linguistic Olympiads for high school students. We first extend the existing benchmark for the task of solving linguistic puzzles. We explore the use of Large Language Models (LLMs), including recent state-of-the-art models such as OpenAI's o1, for solving linguistic puzzles, analyzing their performance across various linguistic topics. We demonstrate that LLMs outperform humans on most puzzles types, except for those centered on writing systems, and for the understudied languages. We use the insights from puzzle-solving experiments to direct the novel task of puzzle generation. We believe that automating puzzle generation, even for relatively simple puzzles, holds promise for expanding interest in linguistics and introducing the field to a broader audience. This finding highlights the importance of linguistic puzzle generation as a research task: such puzzles can not only promote linguistics but also support the dissemination of knowledge about rare and understudied languages.