Explaining Puzzle Solutions in Natural Language: An Exploratory Study on 6x6 Sudoku

📅 2025-05-21

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

This study investigates the capacity of mainstream large language models (LLMs)—including GPT-4, Claude-3, and Llama-3—to jointly solve 6×6 Sudoku puzzles and generate strategic, stepwise, human-interpretable natural language explanations, focusing on explainability rather than mere answer correctness. Method: We conduct zero-shot and few-shot prompting experiments, evaluating both solution accuracy and explanation quality via human assessment and logical consistency analysis. Contribution/Results: Only one model demonstrates baseline puzzle-solving capability; none reliably produce explanations reflecting heuristic strategies, incremental reasoning, or cognitive accessibility. To our knowledge, this is the first empirical study to rigorously assess explanation quality—specifically, strategic interpretability—in structured reasoning tasks. Our findings expose a fundamental limitation in current LLMs’ ability to articulate deliberate, pedagogically sound reasoning processes. The work establishes a novel evaluation benchmark for trustworthy human-AI collaborative decision-making, emphasizing transparency, strategy awareness, and explanatory fidelity over output correctness alone.

Technology Category

Application Category

📝 Abstract

The success of Large Language Models (LLMs) in human-AI collaborative decision-making hinges on their ability to provide trustworthy, gradual, and tailored explanations. Solving complex puzzles, such as Sudoku, offers a canonical example of this collaboration, where clear and customized explanations often hold greater importance than the final solution. In this study, we evaluate the performance of five LLMs in solving and explaining sixsix{} Sudoku puzzles. While one LLM demonstrates limited success in solving puzzles, none can explain the solution process in a manner that reflects strategic reasoning or intuitive problem-solving. These findings underscore significant challenges that must be addressed before LLMs can become effective partners in human-AI collaborative decision-making.

Problem

Research questions and friction points this paper is trying to address.

Evaluating LLMs' ability to solve and explain Sudoku puzzles

Assessing LLMs' strategic reasoning in puzzle solution explanations

Identifying challenges for LLMs in human-AI collaborative decision-making

Innovation

Methods, ideas, or system contributions that make the work stand out.

Evaluating five LLMs on Sudoku solving

Assessing explanation quality of LLMs

Identifying gaps in strategic reasoning

🔎 Similar Papers

Language Models are Crossword Solvers