CoTEvol: Self-Evolving Chain-of-Thoughts for Data Synthesis in Mathematical Reasoning

📅 2026-04-16

📈 Citations: 0

✨ Influential: 0

career value

195K/year

🤖 AI Summary

High-quality chain-of-thought (CoT) data is costly to construct, limiting the performance gains of large language models in mathematical reasoning. This work proposes CoTEvol, the first approach to integrate genetic evolutionary algorithms into CoT synthesis, framing reasoning path generation as a population-based search process. It employs trajectory-level global crossover for holistic recombination and step-level uncertainty-guided local mutation for fine-grained refinement, complemented by a lightweight, task-aware fitness function that balances accuracy and diversity. Experiments demonstrate that CoTEvol improves the success rate of generating correct CoTs by over 30% while significantly enhancing structural diversity. Models trained on CoTEvol-generated data achieve an average performance gain of 6.6% across eight mathematical benchmarks, outperforming existing distillation and self-synthesis methods.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) exhibit strong mathematical reasoning when trained on high-quality Chain-of-Thought (CoT) that articulates intermediate steps, yet costly CoT curation hinders further progress. While existing remedies such as distillation from stronger LLMs and self-synthesis based on test-time search alleviate this issue, they often suffer from diminishing returns or high computing overhead.In this work, we propose CoTEvol, a genetic evolutionary framework that casts CoT generation as a population-based search over reasoning trajectories.Candidate trajectories are iteratively evolved through reflective global crossover at the trajectory level and local mutation guided by uncertainty at the step level, enabling holistic recombination and fine-grained refinement. Lightweight, task-aware fitness functions are designed to guide the evolutionary process toward accurate and diverse reasoning. Empirically, CoTEvol improves correct-CoT synthesis success by over 30% and enhances structural diversity, with markedly improved efficiency. LLMs trained on these evolutionary CoT data achieve an average gain of 6.6% across eight math benchmarks, outperforming previous distillation and self-synthesis approaches. These results underscore the promise of evolutionary CoT synthesis as a scalable and effective method for mathematical reasoning tasks.

Problem

Research questions and friction points this paper is trying to address.

Chain-of-Thought

Data Synthesis

Mathematical Reasoning

Large Language Models

Evolutionary Framework

Innovation

Methods, ideas, or system contributions that make the work stand out.

Chain-of-Thought

Genetic Evolution

Data Synthesis