Schoenfeld's Anatomy of Mathematical Reasoning by Language Models

📅 2025-12-22

📈 Citations: 0

✨ Influential: 0

career value

170K/year

🤖 AI Summary

While large language models’ (LLMs) mathematical reasoning traces exhibit surface-level readability, their underlying cognitive structures remain opaque to statistical feature analysis. Method: We propose ThinkARM, the first framework to integrate Schoenfeld’s episode theory into LLM reasoning analysis—abstracting reasoning into identifiable functional phases (Analysis, Explore, Implement, Verify) for meso-scale dynamic modeling. Contribution/Results: We identify Explore as a critical branching node; efficiency gains arise not from mere trace shortening but from targeted suppression of verification steps. Across diverse mathematical benchmarks and multiple LLMs, ThinkARM robustly reproduces reasoning dynamics, clearly differentiating structural patterns between reasoning-capable and non-reasoning models. The framework establishes a novel, interpretable, and intervenable paradigm for analyzing LLM reasoning processes.

Technology Category

Application Category

📝 Abstract

Large language models increasingly expose reasoning traces, yet their underlying cognitive structure and steps remain difficult to identify and analyze beyond surface-level statistics. We adopt Schoenfeld's Episode Theory as an inductive, intermediate-scale lens and introduce ThinkARM (Anatomy of Reasoning in Models), a scalable framework that explicitly abstracts reasoning traces into functional reasoning steps such as Analysis, Explore, Implement, Verify, etc. When applied to mathematical problem solving by diverse models, this abstraction reveals reproducible thinking dynamics and structural differences between reasoning and non-reasoning models, which are not apparent from token-level views. We further present two diagnostic case studies showing that exploration functions as a critical branching step associated with correctness, and that efficiency-oriented methods selectively suppress evaluative feedback steps rather than uniformly shortening responses. Together, our results demonstrate that episode-level representations make reasoning steps explicit, enabling systematic analysis of how reasoning is structured, stabilized, and altered in modern language models.

Problem

Research questions and friction points this paper is trying to address.

Abstract reasoning traces into functional steps

Analyze thinking dynamics in mathematical problem solving

Diagnose critical branching steps and evaluative feedback suppression

Innovation

Methods, ideas, or system contributions that make the work stand out.

Framework abstracts reasoning into functional steps

Episode-level analysis reveals structural reasoning differences

Exploration identified as critical branching step for correctness

🔎 Similar Papers

No similar papers found.