Integrated and Cross-Architecture Interpretation of LLM Reasoning

📅 2026-05-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the opacity of large language model (LLM) reasoning by proposing the IAR framework, which integrates mutual information peak (MIP) with bandwidth calibration and depth-of-thought ratio (DTR) to comprehensively characterize internal mechanisms. By combining Tukey IQR-based peak detection with Jaccard stability metrics, IAR enables precise identification of critical reasoning tokens and tracks their cross-layer evolution. Through cross-layer overlap analysis, the framework demonstrates robust and generalizable interpretability across four domains—mathematics, code generation, logical reasoning, and commonsense—on Qwen-7B, Qwen-14B, and Llama-8B models, establishing a more holistic and reliable approach to probing LLM inference dynamics.
📝 Abstract
Understanding how LLMs reason is hindered by a practical asymmetry: while their generated outputs are observable, the underlying reasoning patterns remain opaque. Relying on single probes, such as Mutual Information Peak (MIP) or Deep-Thinking Ratio (DTR), risks underestimating the genuine inferential structure. To response this deficiency, we present an Integrated, cross-Architecture Reasoning (IAR) framework, designed to provide a unified approach to LLM reasoning interpretability. Specifically, we first propose to use bandwidth-calibrated MIP coupled with Tukey IQR peak-detection to isolate reasoning-crucial tokens at the output layer. Second, we performed an overlap analysis between MIP-picked tokens and DTR-deep tokens to trace the cross-layer trajectories of those tokens. This also discloses whether reasoning-crucial tokens are computation-intensive as well, further facilitating to understand how reasoning patterns evolve across model layers. Finally, we apply a Jaccard stability metric over multi-domain problems to verify if the MIP-identified tokens are reasoning quality-guaranteed. Extensive experiments on three models (Qwen-7B, Qwen-14B, and Llama-8B) across four domains (mathematics, code, logic, and common sense) demonstrate IAR's generalizable interpretation capabilities across architectures.
Problem

Research questions and friction points this paper is trying to address.

LLM reasoning
interpretability
reasoning opacity
inferential structure
cross-architecture
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrated Interpretability
Cross-Architecture Analysis
Mutual Information Peak
Deep-Thinking Ratio
Jaccard Stability
🔎 Similar Papers