When the Chain Breaks: Interactive Diagnosis of LLM Chain-of-Thought Reasoning Errors

📅 2026-03-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of diagnosing errors in chain-of-thought (CoT) reasoning generated by large language models, which are often verbose and prone to logical or factual inaccuracies. To this end, the authors propose the first step-level error detection method that integrates external fact-checking with symbolic logical verification. They further develop ReasonDiag, an interactive visualization system that combines arc diagrams and hierarchical node-link graphs to reveal the reasoning flow and trace error propagation paths. Through technical evaluation, two case studies, and user interviews with 16 participants, the study demonstrates that ReasonDiag effectively supports users in comprehending complex reasoning processes, accurately identifying erroneous steps, and tracing underlying root causes.

Technology Category

Application Category

📝 Abstract
Current Large Language Models (LLMs), especially Large Reasoning Models, can generate Chain-of-Thought (CoT) reasoning traces to illustrate how they produce final outputs, thereby facilitating trust calibration for users. However, these CoT reasoning traces are usually lengthy and tedious, and can contain various issues, such as logical and factual errors, which make it difficult for users to interpret the reasoning traces efficiently and accurately. To address these challenges, we develop an error detection pipeline that combines external fact-checking with symbolic formal logical validation to identify errors at the step level. Building on this pipeline, we propose ReasonDiag, an interactive visualization system for diagnosing CoT reasoning traces. ReasonDiag provides 1) an integrated arc diagram to show reasoning-step distributions and error-propagation patterns, and 2) a hierarchical node-link diagram to visualize high-level reasoning flows and premise dependencies. We evaluate ReasonDiag through a technical evaluation for the error detection pipeline, two case studies, and user interviews with 16 participants. The results indicate that ReasonDiag helps users effectively understand CoT reasoning traces, identify erroneous steps, and determine their root causes.
Problem

Research questions and friction points this paper is trying to address.

Chain-of-Thought
reasoning errors
error diagnosis
Large Language Models
interpretability
Innovation

Methods, ideas, or system contributions that make the work stand out.

Chain-of-Thought
Error Detection
Interactive Visualization
Logical Validation
Fact-Checking
🔎 Similar Papers
No similar papers found.
S
Shiwei Chen
College of Computing and Data Science, Nanyang Technological University, Singapore
N
Niruthikka Sritharan
College of Computing and Data Science, Nanyang Technological University, Singapore
X
Xiaolin Wen
College of Computing and Data Science, Nanyang Technological University, Singapore
C
Chenxi Zhang
College of Computing and Data Science, Nanyang Technological University, Singapore
Xingbo Wang
Xingbo Wang
Research Scientist, Bosch Center for AI
HCINatural Language ProcessingMultimodalityComputer Vision
Yong Wang
Yong Wang
Assistant Professor, Nanyang Technological University
Data VisualizationHCIHuman-AI CollaborationFinTechQuantum Computing