MathBode: Frequency-Domain Fingerprints of LLM Mathematical Reasoning

📅 2025-09-27

📈 Citations: 0

✨ Influential: 0

career value

186K/year

🤖 AI Summary

Existing LLM mathematical reasoning evaluation relies excessively on static accuracy metrics, failing to expose latent defects in reasoning dynamics. Method: We propose MathBode—the first dynamic assessment framework for LLM capability diagnosis that imports control-theoretic Bode analysis. It models mathematical problems as systems, drives model responses with parameterized sinusoidal inputs, and fits the fundamental harmonic response to extract gain–phase frequency-response “fingerprints.” Contribution/Results: Evaluated across five closed-form mathematical problem families and symbolic computation benchmarks, MathBode reveals, for the first time in the frequency domain, pervasive low-pass characteristics and phase-lag phenomena in LLMs. It enables quantitative differentiation between reasoning fidelity and consistency, yields compact and reproducible evaluation protocols, and is fully open-sourced—including datasets and code.

Technology Category

Application Category

📝 Abstract

This paper presents MathBode, a dynamic diagnostic for mathematical reasoning in large language models (LLMs). Instead of one-shot accuracy, MathBode treats each parametric problem as a system: we drive a single parameter sinusoidally and fit first-harmonic responses of model outputs and exact solutions. This yields interpretable, frequency-resolved metrics -- gain (amplitude tracking) and phase (lag) -- that form Bode-style fingerprints. Across five closed-form families (linear solve, ratio/saturation, compound interest, 2x2 linear systems, similar triangles), the diagnostic surfaces systematic low-pass behavior and growing phase lag that accuracy alone obscures. We compare several models against a symbolic baseline that calibrates the instrument ($G approx 1$, $φapprox 0$). Results separate frontier from mid-tier models on dynamics, providing a compact, reproducible protocol that complements standard benchmarks with actionable measurements of reasoning fidelity and consistency. We open-source the dataset and code to enable further research and adoption.

Problem

Research questions and friction points this paper is trying to address.

Diagnosing mathematical reasoning dynamics in LLMs using frequency-domain analysis

Revealing systematic low-pass behavior and phase lag in model responses

Providing interpretable metrics beyond accuracy for reasoning fidelity evaluation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Frequency-domain analysis of LLM mathematical reasoning

Bode-style fingerprints with gain and phase metrics

Dynamic diagnostic protocol complementing standard benchmarks

🔎 Similar Papers

No similar papers found.