🤖 AI Summary
This study addresses the challenge of constructing linguistic puzzles, which is typically time-consuming and resource-intensive, by proposing a rule-driven, systematic approach that efficiently converts Rosetta Stone–formatted problems into Match-Up format for the first time, thereby creating the first paired puzzle dataset. Human experiments and large language model (LLM) benchmarking reveal that both humans and LLMs exhibit an “all-or-nothing” solving pattern on Match-Up puzzles—either solving them completely correctly or failing entirely. This finding highlights the unique demands of the Match-Up format for systematic linguistic reasoning and offers a novel perspective, along with a high-quality resource, for evaluating and understanding the language reasoning capabilities of both humans and LLMs.
📝 Abstract
In this paper, we examine linguistic puzzles used in high school linguistics competitions, focusing on two common formats: Rosetta Stone and Match-Up. We propose a systematic procedure for converting existing Rosetta Stone puzzles into corresponding Match-Up counterparts. Because linguistic puzzle creation is complex and time-consuming, our method provides an efficient way to accelerate the generation of new puzzles. We evaluate the resulting Rosetta Stone-Match-Up pairs with both human participants and large language models (LLMs). Our results show that both expert human solvers and LLMs display an all-or-nothing pattern on Match-Up puzzles, either solving them completely or failing entirely. This work contributes a new dataset of paired puzzles and provides a detailed evaluation of puzzle difficulty across formats, offering insights into both human and machine linguistic reasoning.