🤖 AI Summary
Prior work has not clarified how structural characteristics of long chain-of-thought (LCoT) reasoning affect the correctness of large language model (LLM) inference. Method: We propose LCoT2Tree, the first framework to automatically model LCoT as a hierarchical tree structure, enabling systematic analysis of structural patterns—including exploration breadth, backtracking depth, and verification density—as predictors of answer correctness. We further identify interpretable failure modes (e.g., “over-branching”) and design a graph neural network (GNN)-based diagnostic model grounded in these structural features. Contribution/Results: Experiments across multiple tasks and LLMs demonstrate substantial improvements in diagnostic accuracy for reasoning processes. Moreover, leveraging structural insights to guide Best-of-N sampling significantly boosts final answer correctness. This work establishes a novel paradigm for enhancing LLM reasoning interpretability and controllable optimization through structural modeling.
📝 Abstract
Recent advances in reasoning with large language models (LLMs) have popularized Long Chain-of-Thought (LCoT), a strategy that encourages deliberate and step-by-step reasoning before producing a final answer. While LCoTs have enabled expert-level performance in complex tasks, how the internal structures of their reasoning chains drive, or even predict, the correctness of final answers remains a critical yet underexplored question. In this work, we present LCoT2Tree, an automated framework that converts sequential LCoTs into hierarchical tree structures and thus enables deeper structural analysis of LLM reasoning. Using graph neural networks (GNNs), we reveal that structural patterns extracted by LCoT2Tree, including exploration, backtracking, and verification, serve as stronger predictors of final performance across a wide range of tasks and models. Leveraging an explainability technique, we further identify critical thought patterns such as over-branching that account for failures. Beyond diagnostic insights, the structural patterns by LCoT2Tree support practical applications, including improving Best-of-N decoding effectiveness. Overall, our results underscore the critical role of internal structures of reasoning chains, positioning LCoT2Tree as a powerful tool for diagnosing, interpreting, and improving reasoning in LLMs.