CoT-Evo: Evolutionary Distillation of Chain-of-Thought for Scientific Reasoning

📅 2025-10-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing chain-of-thought (CoT) distillation methods underperform on scientific reasoning tasks primarily because large language models often produce erroneous or superficial reasoning traces, yielding low-quality distillation data. To address this, we propose an evolutionary CoT distillation framework: it first generates initial reasoning trajectories via multi-LLM collaborative inference augmented with domain knowledge; then defines a fitness function grounded in correctness and logical coherence; and—novelty introduced—incorporates evolutionary computation into CoT optimization through novelty-based selection, reflective recombination, and mutation operators to iteratively refine reasoning quality. Crucially, our method decouples distillation from teacher-model accuracy, enabling high-fidelity training data extraction even from flawed reasoning. Experiments demonstrate that the evolved CoT dataset substantially enhances small student models’ performance, achieving state-of-the-art results across multiple scientific reasoning benchmarks.

Technology Category

Application Category

📝 Abstract
While chain-of-thought (CoT) distillation from advanced large language models (LLMs) has proven effective in general reasoning tasks, it struggles in scientific domains where even advanced models often produce incorrect or superficial reasoning due to high complexity and specialized knowledge requirements. Directly distilling from such flawed outputs results in low-quality training data and limits the performance of smaller student models. To overcome this, we propose CoT-Evo, an evolutionary CoT distillation framework. It begins by constructing a diverse pool of reasoning trajectories from multiple LLM thinkers, enriches them with automatically retrieved domain knowledge, and iteratively refines the trajectories using novelty-driven selection, reflective recombination and mutation. The refinement is guided by a fitness function that evaluates answer correctness, coherence, and effective knowledge utilization. This results in a high-quality CoT dataset tailored for scientific reasoning. We employ this evolved dataset to fine-tune a compact model, which achieves state-of-the-art performance on scientific reasoning benchmarks. Our work establishes a scalable approach to synthesizing high-fidelity scientific reasoning data from diverse and fallible LLMs.
Problem

Research questions and friction points this paper is trying to address.

Improving scientific reasoning in small language models
Overcoming flawed chain-of-thought distillation from imperfect LLMs
Generating high-quality reasoning data for specialized knowledge domains
Innovation

Methods, ideas, or system contributions that make the work stand out.

Evolutionary distillation framework refines reasoning trajectories
Novelty-driven selection with fitness-guided iterative refinement
Knowledge-enriched CoT dataset fine-tunes compact scientific model
🔎 Similar Papers
No similar papers found.
Kehua Feng
Kehua Feng
Ph.D. student, Zhejiang University
Natural Language ProcessingLanguage ModelAI for Science
K
Keyan Ding
Zhejiang University, ZJU-Hangzhou Global Scientific and Technological Innovation Center, AntGroup
Zhihui Zhu
Zhihui Zhu
Assistant Professor, Ohio State University
Machine LearningData ScienceSignal ProcessingOptimization
Lei Liang
Lei Liang
Ant Group
Knowledge GraphAI
Q
Qiang Zhang
Zhejiang University, ZJU-Hangzhou Global Scientific and Technological Innovation Center, AntGroup
H
Huajun Chen
Zhejiang University, ZJU-Hangzhou Global Scientific and Technological Innovation Center, AntGroup