🤖 AI Summary
This work investigates the capability of large language models (LLMs) to solve cryptic crossword puzzles, addressing two core challenges: single-clue interpretation and full-grid autocompletion. We propose an end-to-end, constraint-aware grid-solving framework: first, semantic and lexical constraints are extracted from clues via prompt engineering and structured parsing; second, a constraint-driven backtracking search algorithm refines LLM-generated candidate words to ensure grid-wide consistency, enabling interpretable and verifiable inference. To our knowledge, this is the first fully automated, human-intervention-free cryptic crossword solver. Evaluated on the New York Times cryptic crossword dataset, our method achieves 93% cell-level accuracy and improves puzzle-level success rate by 2–3× over prior state-of-the-art. The framework establishes a novel paradigm for applying LLMs to structured reasoning and symbol-constrained combinatorial tasks.
📝 Abstract
Crosswords are a form of word puzzle that require a solver to demonstrate a high degree of proficiency in natural language understanding, wordplay, reasoning, and world knowledge, along with adherence to character and length constraints. In this paper we tackle the challenge of solving crosswords with large language models (LLMs). We demonstrate that the current generation of language models shows significant competence at deciphering cryptic crossword clues and outperforms previously reported state-of-the-art (SoTA) results by a factor of 2-3 in relevant benchmarks. We also develop a search algorithm that builds off this performance to tackle the problem of solving full crossword grids with out-of-the-box LLMs for the very first time, achieving an accuracy of 93% on New York Times crossword puzzles. Additionally, we demonstrate that LLMs generalize well and are capable of supporting answers with sound rationale.