Thought Anchors: Which LLM Reasoning Steps Matter?

📅 2025-06-23

📈 Citations: 0

✨ Influential: 0

career value

175K/year

🤖 AI Summary

Large language models (LLMs) exhibit limited decomposability in long-chain reasoning due to their autoregressive nature, hindering interpretability of reasoning paths. To address this, we introduce *thought anchors*—sentence-level reasoning steps exerting disproportionately strong influence on subsequent inference. We propose a multimodal attribution framework integrating three complementary techniques: black-box counterfactual analysis, white-box attention aggregation, and causal attribution—augmented by attention head pattern analysis and logical dependency modeling. Experiments demonstrate the efficacy of sentence-level attribution, revealing that planning- and backtracking-oriented sentences serve as critical anchors; results across methods show strong consistency. We release Thought Anchors (www.thought-anchors.com), an open-source visualization toolkit that significantly enhances the interpretability and diagnostic capability of LLM reasoning processes.

Technology Category

Application Category

📝 Abstract

Reasoning large language models have recently achieved state-of-the-art performance in many fields. However, their long-form chain-of-thought reasoning creates interpretability challenges as each generated token depends on all previous ones, making the computation harder to decompose. We argue that analyzing reasoning traces at the sentence level is a promising approach to understanding reasoning processes. We present three complementary attribution methods: (1) a black-box method measuring each sentence's counterfactual importance by comparing final answers across 100 rollouts conditioned on the model generating that sentence or one with a different meaning; (2) a white-box method of aggregating attention patterns between pairs of sentences, which identified ``broadcasting'' sentences that receive disproportionate attention from all future sentences via ``receiver'' attention heads; (3) a causal attribution method measuring logical connections between sentences by suppressing attention toward one sentence and measuring the effect on each future sentence's tokens. Each method provides evidence for the existence of thought anchors, reasoning steps that have outsized importance and that disproportionately influence the subsequent reasoning process. These thought anchors are typically planning or backtracking sentences. We provide an open-source tool (www.thought-anchors.com) for visualizing the outputs of our methods, and present a case study showing converging patterns across methods that map how a model performs multi-step reasoning. The consistency across methods demonstrates the potential of sentence-level analysis for a deeper understanding of reasoning models.

Problem

Research questions and friction points this paper is trying to address.

Identify key reasoning steps in LLM outputs

Measure sentence-level influence on final answers

Analyze attention patterns for critical reasoning anchors

Innovation

Methods, ideas, or system contributions that make the work stand out.

Black-box counterfactual importance analysis

White-box attention pattern aggregation

Causal attribution via attention suppression

🔎 Similar Papers

No similar papers found.