Self-Reasoning Language Models: Unfold Hidden Reasoning Chains with Few Reasoning Catalyst

📅 2025-05-20

📈 Citations: 3

✨ Influential: 0

career value

189K/year

🤖 AI Summary

To address the bottleneck of large language models (LLMs) relying heavily on extensive human-annotated reasoning examples (e.g., Chain-of-Thought, CoT), this paper proposes Self-Reasoning Language Models (SRLMs). Our method introduces a meta-reasoning framework based on self-training, integrating prompt engineering, sampling augmentation, and cognitive skill modeling to enable iterative, annotation-free improvement of reasoning capabilities. Crucially, SRLMs leverage only 1,000 high-quality, few-shot reasoning examples as “catalysts” to trigger autonomous generation of long-horizon, multi-step implicit reasoning chains. The core innovation lies in the novel mechanism of “few-shot reasoning catalysis + model-autogenerated long-chain data,” which substantially enhances reasoning depth, diversity, and stability. Experiments across five benchmarks—including MMLU and GSM8K—demonstrate an average gain of +2.5 points, rising to +7.89 points under 64 sampling iterations.

Technology Category

Application Category

📝 Abstract

Inference-time scaling has attracted much attention which significantly enhance the performance of Large Language Models (LLMs) in complex reasoning tasks by increasing the length of Chain-of-Thought. These longer intermediate reasoning rationales embody various meta-reasoning skills in human cognition, such as reflection and decomposition, being difficult to create and acquire. In this work, we introduce extit{Self-Reasoning Language Model} (SRLM), where the model itself can synthesize longer CoT data and iteratively improve performance through self-training. By incorporating a few demonstration examples (i.e., 1,000 samples) on how to unfold hidden reasoning chains from existing responses, which act as a reasoning catalyst, we demonstrate that SRLM not only enhances the model's initial performance but also ensures more stable and consistent improvements in subsequent iterations. Our proposed SRLM achieves an average absolute improvement of more than $+2.5$ points across five reasoning tasks: MMLU, GSM8K, ARC-C, HellaSwag, and BBH on two backbone models. Moreover, it brings more improvements with more times of sampling during inference, such as absolute $+7.89$ average improvement with $64$ sampling times, revealing the in-depth, diverse and creative reasoning paths in SRLM against the strong baseline.

Problem

Research questions and friction points this paper is trying to address.

Enhancing LLMs' reasoning via self-generated Chain-of-Thought data

Improving reasoning stability with minimal catalyst examples

Boosting performance across multiple reasoning tasks iteratively

Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-Reasoning Language Model synthesizes longer CoT data

Uses few reasoning catalyst examples for improvement

Iterative self-training enhances performance and stability

🔎 Similar Papers

Semantic Self-Consistency: Enhancing Language Model Reasoning via Semantic Weighting