Probing Structural Mathematical Reasoning in Language Models with Algebraic Trapdoors

📅 2026-05-05
📈 Citations: 0
Influential: 0
📄 PDF

career value

204K/year
📝 Abstract
We introduce a benchmark suite for evaluating structural mathematical reasoning in language models, built on subgroup-construction problems in SL(3, Z) with cryptographic-style verifier-prover asymmetry. Each instance presents a finitely generated subgroup as a list of integer matrices and asks for an arithmetic invariant -- index, surjection-at-prime, or membership -- that the construction-time information (N, K) pins down in O(1) closed form, but that the solver, lacking that information, must derive by either Aschbacher-classification analysis or by a membership query in SL(3, Z) of unknown decidability. The benchmark therefore distinguishes models with internalized algebraic priors (Aschbacher classes, McLaughlin's theorem, Property (T), the congruence subgroup property) from models that rely on general-purpose computation. We report empirical results across five representative reasoning traces from two state-of-the-art models. The headline result: on the index variant, one model spent 152 minutes of reasoning, explicitly identified the kernel-side membership question as the bottleneck, attempted constructive verification, and abstained with "DON'T KNOW" rather than commit to its computed cokernel candidate -- demonstrating calibrated meta-cognition on the open-decidability boundary that the benchmark was designed to probe. We argue that the benchmark exposes a four-way classification of model behavior (commit-correct, commit-wrong, abstain-correct, abstain-wrong) that standard answer-key scoring conflates.
Problem

Research questions and friction points this paper is trying to address.

structural mathematical reasoning
language models
algebraic trapdoors
subgroup membership
arithmetic invariants
Innovation

Methods, ideas, or system contributions that make the work stand out.

structural mathematical reasoning
algebraic trapdoors
SL(3, Z)
Aschbacher classification
meta-cognition
🔎 Similar Papers
No similar papers found.