🤖 AI Summary
Large language models (LLMs) exhibit low intermediate-step reliability and poor interpretability in chain-of-thought (CoT) reasoning, particularly in mathematical domains. Method: Inspired by the Uniform Information Density (UID) hypothesis from psycholinguistics, we systematically investigate information flow dynamics in LLM mathematical reasoning. We propose an entropy-based information density metric to globally model reasoning trajectories and conduct empirical analysis across multiple mathematical reasoning benchmarks. Contribution/Results: We find that successful reasoning paths display显著 non-uniform, oscillatory information density—contrasting sharply with the smooth, uniform information flow predicted by UID for human cognition. This challenges the prevailing paradigm that machine reasoning should emulate human information distribution. Our findings reveal a statistically distinct intrinsic reasoning pattern in LLMs, providing both theoretical grounding and concrete design principles for developing more reliable and interpretable next-generation reasoning models.
📝 Abstract
Large language models (LLMs) often solve problems using step-by-step Chain-of-Thought (CoT) reasoning, yet these intermediate steps are frequently unfaithful or hard to interpret. Inspired by the Uniform Information Density (UID) hypothesis in psycholinguistics -- which posits that humans communicate by maintaining a stable flow of information -- we introduce entropy-based metrics to analyze the information flow within reasoning traces. Surprisingly, across three challenging mathematical benchmarks, we find that successful reasoning in LLMs is globally non-uniform: correct solutions are characterized by uneven swings in information density, in stark contrast to human communication patterns. This result challenges assumptions about machine reasoning and suggests new directions for designing interpretable and adaptive reasoning models.