Verifying Chain-of-Thought Reasoning via Its Computational Graph

📅 2025-10-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing black-box and gray-box Chain-of-Thought (CoT) verification methods fail to uncover the deep structural causes of reasoning errors. To address this, we propose Circuit-based Reasoning Verification (CRV), a white-box approach that constructs an attribution-driven computational graph of reasoning steps from structural “fingerprints” of the underlying computation graph. CRV extracts topological and path-based features to enable interpretable verification of large language models’ reasoning circuits. Our analysis reveals that error patterns are task-specific and that structural signals exhibit causal validity: CRV achieves high-accuracy error prediction (AUC > 0.92) and precisely localizes critical neurons, enabling targeted intervention that successfully corrects erroneous reasoning. CRV thus bridges three key gaps—error detection, causal understanding, and controllable correction—establishing a novel paradigm for trustworthy reasoning in large language models.

Technology Category

Application Category

📝 Abstract
Current Chain-of-Thought (CoT) verification methods predict reasoning correctness based on outputs (black-box) or activations (gray-box), but offer limited insight into why a computation fails. We introduce a white-box method: Circuit-based Reasoning Verification (CRV). We hypothesize that attribution graphs of correct CoT steps, viewed as execution traces of the model's latent reasoning circuits, possess distinct structural fingerprints from those of incorrect steps. By training a classifier on structural features of these graphs, we show that these traces contain a powerful signal of reasoning errors. Our white-box approach yields novel scientific insights unattainable by other methods. (1) We demonstrate that structural signatures of error are highly predictive, establishing the viability of verifying reasoning directly via its computational graph. (2) We find these signatures to be highly domain-specific, revealing that failures in different reasoning tasks manifest as distinct computational patterns. (3) We provide evidence that these signatures are not merely correlational; by using our analysis to guide targeted interventions on individual transcoder features, we successfully correct the model's faulty reasoning. Our work shows that, by scrutinizing a model's computational process, we can move from simple error detection to a deeper, causal understanding of LLM reasoning.
Problem

Research questions and friction points this paper is trying to address.

Verifying reasoning correctness via computational graph structural analysis
Identifying domain-specific computational patterns in reasoning failures
Providing causal understanding to correct faulty LLM reasoning
Innovation

Methods, ideas, or system contributions that make the work stand out.

White-box verification via computational graph structure
Training classifier on structural fingerprints of reasoning
Targeted interventions correct faulty reasoning patterns
🔎 Similar Papers
No similar papers found.