🤖 AI Summary
This work proposes the “reduction ladder” framework to address the oversimplified dichotomy in current evaluations that categorize large language models’ performance on cognitive reasoning tasks as either memorization or genuine reasoning, thereby overlooking intermediate reduction-based strategies. By systematically generating progressively modified variants of classic cognitive logic puzzles—each incrementally increasing in difficulty and designed to disrupt reliance on memorized problem mappings—the framework effectively disentangles memorization, reduction strategies, and authentic cognitive reasoning. Experimental results reveal that while some large models succeed by leveraging reduction strategies, they generally underperform on tasks requiring true cognitive reasoning. Treating memorization as a special case of reduction, this approach offers a novel paradigm for fine-grained analysis of model reasoning behaviors.
📝 Abstract
Epistemic reasoning requires agents to infer the state of the world from partial observations and information about other agents' knowledge. Prior work evaluating LLMs on canonical epistemic puzzles interpreted their behavior through a dichotomy between epistemic reasoning and brittle memorization. We argue that this framing is incomplete: in recent models, memorization is better understood as a special case of reduction, where a new instance is mapped onto a known problem. Instead, we introduce a reduction ladder, a sequence of modifications that progressively move instances away from a canonical epistemic puzzle, making reduction increasingly difficult while preserving the underlying logic. We find that while some large models succeed via reduction, other models fail early, and all models struggle once epistemic reasoning is required.