Self-Reasoning Language Models: Unfold Hidden Reasoning Chains with Few Reasoning Catalyst

📅 2025-05-20
📈 Citations: 3
Influential: 0
📄 PDF
🤖 AI Summary
To address the bottleneck of large language models (LLMs) relying heavily on extensive human-annotated reasoning examples (e.g., Chain-of-Thought, CoT), this paper proposes Self-Reasoning Language Models (SRLMs). Our method introduces a meta-reasoning framework based on self-training, integrating prompt engineering, sampling augmentation, and cognitive skill modeling to enable iterative, annotation-free improvement of reasoning capabilities. Crucially, SRLMs leverage only 1,000 high-quality, few-shot reasoning examples as “catalysts” to trigger autonomous generation of long-horizon, multi-step implicit reasoning chains. The core innovation lies in the novel mechanism of “few-shot reasoning catalysis + model-autogenerated long-chain data,” which substantially enhances reasoning depth, diversity, and stability. Experiments across five benchmarks—including MMLU and GSM8K—demonstrate an average gain of +2.5 points, rising to +7.89 points under 64 sampling iterations.

Technology Category

Application Category

📝 Abstract
Inference-time scaling has attracted much attention which significantly enhance the performance of Large Language Models (LLMs) in complex reasoning tasks by increasing the length of Chain-of-Thought. These longer intermediate reasoning rationales embody various meta-reasoning skills in human cognition, such as reflection and decomposition, being difficult to create and acquire. In this work, we introduce extit{Self-Reasoning Language Model} (SRLM), where the model itself can synthesize longer CoT data and iteratively improve performance through self-training. By incorporating a few demonstration examples (i.e., 1,000 samples) on how to unfold hidden reasoning chains from existing responses, which act as a reasoning catalyst, we demonstrate that SRLM not only enhances the model's initial performance but also ensures more stable and consistent improvements in subsequent iterations. Our proposed SRLM achieves an average absolute improvement of more than $+2.5$ points across five reasoning tasks: MMLU, GSM8K, ARC-C, HellaSwag, and BBH on two backbone models. Moreover, it brings more improvements with more times of sampling during inference, such as absolute $+7.89$ average improvement with $64$ sampling times, revealing the in-depth, diverse and creative reasoning paths in SRLM against the strong baseline.
Problem

Research questions and friction points this paper is trying to address.

Enhancing LLMs' reasoning via self-generated Chain-of-Thought data
Improving reasoning stability with minimal catalyst examples
Boosting performance across multiple reasoning tasks iteratively
Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-Reasoning Language Model synthesizes longer CoT data
Uses few reasoning catalyst examples for improvement
Iterative self-training enhances performance and stability
🔎 Similar Papers
No similar papers found.