EntroCoT: Enhancing Chain-of-Thought via Adaptive Entropy-Guided Segmentation

📅 2026-01-07
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses a critical flaw in existing chain-of-thought (CoT) fine-tuning data, where correct final answers are often accompanied by flawed intermediate reasoning—such as hallucinations, redundancies, or logical errors. To mitigate this, the authors propose EntroCoT, a novel framework that introduces entropy-guided dynamic reasoning segmentation and step-level marginal contribution evaluation. Specifically, EntroCoT adaptively partitions reasoning trajectories using information entropy and employs Monte Carlo rollouts to quantify each step’s contribution to the final answer, enabling automatic filtering of low-quality samples. Experiments across multiple mathematical reasoning benchmarks demonstrate that models fine-tuned on high-quality CoT datasets curated by EntroCoT significantly outperform those trained on full original datasets, highlighting the effectiveness of the proposed approach in enhancing reasoning fidelity and model performance.

Technology Category

Application Category

📝 Abstract
Chain-of-Thought (CoT) prompting has significantly enhanced the mathematical reasoning capabilities of Large Language Models. We find existing fine-tuning datasets frequently suffer from the"answer right but reasoning wrong"probelm, where correct final answers are derived from hallucinated, redundant, or logically invalid intermediate steps. This paper proposes EntroCoT, a unified framework for automatically identifying and refining low-quality CoT supervision traces. EntroCoT first proposes an entropy-based mechanism to segment the reasoning trace into multiple steps at uncertain junctures, and then introduces a Monte Carlo rollout-based mechanism to evaluate the marginal contribution of each step. By accurately filtering deceptive reasoning samples, EntroCoT constructs a high-quality dataset where every intermediate step in each reasoning trace facilitates the final answer. Extensive experiments on mathematical benchmarks demonstrate that fine-tuning on the subset constructed by EntroCoT consistently outperforms the baseslines of full-dataset supervision.
Problem

Research questions and friction points this paper is trying to address.

Chain-of-Thought
reasoning quality
hallucination
fine-tuning dataset
mathematical reasoning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Chain-of-Thought
entropy-based segmentation
Monte Carlo rollout
reasoning quality filtering
large language models
🔎 Similar Papers
No similar papers found.
Z
Zihang Li
Peking University
Y
Yuhang Wang
Peking University
Y
Yikun Zong
Peking University
W
Wenhan Yu
Peking University
X
Xiaokun Yuan
Peking University
R
Runhan Jiang
IQuest Research
Zirui Liu
Zirui Liu
Peking University
SystemsAlgorithmsData Structures
Tong Yang
Tong Yang
Peking University, Beijing, China. PKU. 北京大学
SketchNetwork measurementBloom filterIP lookupHash Table
A
Arthur Jiang
IQuest Research