Verifying Chain-of-Thought Reasoning via Its Computational Graph

📅 2025-10-10

📈 Citations: 0

✨ Influential: 0

career value

193K/year

🤖 AI Summary

Existing black-box and gray-box Chain-of-Thought (CoT) verification methods fail to uncover the deep structural causes of reasoning errors. To address this, we propose Circuit-based Reasoning Verification (CRV), a white-box approach that constructs an attribution-driven computational graph of reasoning steps from structural “fingerprints” of the underlying computation graph. CRV extracts topological and path-based features to enable interpretable verification of large language models’ reasoning circuits. Our analysis reveals that error patterns are task-specific and that structural signals exhibit causal validity: CRV achieves high-accuracy error prediction (AUC > 0.92) and precisely localizes critical neurons, enabling targeted intervention that successfully corrects erroneous reasoning. CRV thus bridges three key gaps—error detection, causal understanding, and controllable correction—establishing a novel paradigm for trustworthy reasoning in large language models.

Technology Category

Application Category

📝 Abstract

Current Chain-of-Thought (CoT) verification methods predict reasoning correctness based on outputs (black-box) or activations (gray-box), but offer limited insight into why a computation fails. We introduce a white-box method: Circuit-based Reasoning Verification (CRV). We hypothesize that attribution graphs of correct CoT steps, viewed as execution traces of the model's latent reasoning circuits, possess distinct structural fingerprints from those of incorrect steps. By training a classifier on structural features of these graphs, we show that these traces contain a powerful signal of reasoning errors. Our white-box approach yields novel scientific insights unattainable by other methods. (1) We demonstrate that structural signatures of error are highly predictive, establishing the viability of verifying reasoning directly via its computational graph. (2) We find these signatures to be highly domain-specific, revealing that failures in different reasoning tasks manifest as distinct computational patterns. (3) We provide evidence that these signatures are not merely correlational; by using our analysis to guide targeted interventions on individual transcoder features, we successfully correct the model's faulty reasoning. Our work shows that, by scrutinizing a model's computational process, we can move from simple error detection to a deeper, causal understanding of LLM reasoning.

Problem

Research questions and friction points this paper is trying to address.

Verifying reasoning correctness via computational graph structural analysis

Identifying domain-specific computational patterns in reasoning failures

Providing causal understanding to correct faulty LLM reasoning

Innovation

Methods, ideas, or system contributions that make the work stand out.

White-box verification via computational graph structure

Training classifier on structural fingerprints of reasoning

Targeted interventions correct faulty reasoning patterns

🔎 Similar Papers

No similar papers found.