Expanding Reasoning Potential in Foundation Model by Learning Diverse Chains of Thought Patterns

📅 2025-09-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing methods for Chain-of-Thought (CoT) data utilization lack selectivity, failing to identify the most effective data types for enhancing model reasoning capabilities. Method: This work formally defines “reasoning potential” as the reciprocal of the number of independent attempts required for correct problem solving; constructs a core reference set grounded in atomic reasoning patterns; and proposes a dual-granularity filtering algorithm that integrates token-level entropy analysis with a Mixture-of-Experts (MoE) architecture to precisely inject high-value CoT data during mid-training. Results: Using only 10B tokens of curated data, the approach improves the performance of an 85A6B MoE model by 9.58% on AIME 2024/2025 and raises the upper bound of downstream reinforcement learning performance by 7.81%. Core contributions include: (i) a theoretical model of reasoning potential, (ii) an atomic pattern-based representation framework for reasoning, and (iii) an efficient, principled paradigm for CoT data selection.

Technology Category

Application Category

📝 Abstract
Recent progress in large reasoning models for challenging mathematical reasoning has been driven by reinforcement learning (RL). Incorporating long chain-of-thought (CoT) data during mid-training has also been shown to substantially improve reasoning depth. However, current approaches often utilize CoT data indiscriminately, leaving open the critical question of which data types most effectively enhance model reasoning capabilities. In this paper, we define the foundation model's reasoning potential for the first time as the inverse of the number of independent attempts required to correctly answer the question, which is strongly correlated with the final model performance. We then propose utilizing diverse data enriched with high-value reasoning patterns to expand the reasoning potential. Specifically, we abstract atomic reasoning patterns from CoT sequences, characterized by commonality and inductive capabilities, and use them to construct a core reference set enriched with valuable reasoning patterns. Furthermore, we propose a dual-granularity algorithm involving chains of reasoning patterns and token entropy, efficiently selecting high-value CoT data (CoTP) from the data pool that aligns with the core set, thereby training models to master reasoning effectively. Only 10B-token CoTP data enables the 85A6B Mixture-of-Experts (MoE) model to improve by 9.58% on the challenging AIME 2024 and 2025, and to raise the upper bound of downstream RL performance by 7.81%.
Problem

Research questions and friction points this paper is trying to address.

Identifying which chain-of-thought data types most effectively enhance model reasoning capabilities
Defining reasoning potential as attempts needed to correctly answer questions
Developing methods to select high-value reasoning patterns for training models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Abstract atomic reasoning patterns from CoT sequences
Construct core reference set with valuable reasoning patterns
Select high-value CoT data using dual-granularity algorithm
🔎 Similar Papers
No similar papers found.