🤖 AI Summary
This work addresses the opacity of large language model (LLM) reasoning by proposing the IAR framework, which integrates mutual information peak (MIP) with bandwidth calibration and depth-of-thought ratio (DTR) to comprehensively characterize internal mechanisms. By combining Tukey IQR-based peak detection with Jaccard stability metrics, IAR enables precise identification of critical reasoning tokens and tracks their cross-layer evolution. Through cross-layer overlap analysis, the framework demonstrates robust and generalizable interpretability across four domains—mathematics, code generation, logical reasoning, and commonsense—on Qwen-7B, Qwen-14B, and Llama-8B models, establishing a more holistic and reliable approach to probing LLM inference dynamics.
📝 Abstract
Understanding how LLMs reason is hindered by a practical asymmetry: while their generated outputs are observable, the underlying reasoning patterns remain opaque. Relying on single probes, such as Mutual Information Peak (MIP) or Deep-Thinking Ratio (DTR), risks underestimating the genuine inferential structure. To response this deficiency, we present an Integrated, cross-Architecture Reasoning (IAR) framework, designed to provide a unified approach to LLM reasoning interpretability. Specifically, we first propose to use bandwidth-calibrated MIP coupled with Tukey IQR peak-detection to isolate reasoning-crucial tokens at the output layer. Second, we performed an overlap analysis between MIP-picked tokens and DTR-deep tokens to trace the cross-layer trajectories of those tokens. This also discloses whether reasoning-crucial tokens are computation-intensive as well, further facilitating to understand how reasoning patterns evolve across model layers. Finally, we apply a Jaccard stability metric over multi-domain problems to verify if the MIP-identified tokens are reasoning quality-guaranteed. Extensive experiments on three models (Qwen-7B, Qwen-14B, and Llama-8B) across four domains (mathematics, code, logic, and common sense) demonstrate IAR's generalizable interpretation capabilities across architectures.