Self-Enhanced Reasoning Training: Activating Latent Reasoning in Small Models for Enhanced Reasoning Distillation

📅 2025-02-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Small language models (e.g., GPT-2) exhibit weak reasoning capabilities and struggle to effectively distill reasoning knowledge from larger models. We observe that such models implicitly generate high-quality reasoning paths during zero-shot sampling, yet these paths are suppressed by standard decoding strategies. Method: We propose the first teacher-free, annotation-free self-amplification training paradigm: it leverages zero-shot self-generated reasoning paths filtered by path quality, applies confidence-weighted self-training, and jointly optimizes with GPT-3.5-based reasoning distillation to activate and harness the model’s latent reasoning capacity. Contribution/Results: Our approach overcomes traditional limitations—such as reliance on teacher-provided chain-of-thought prompts or explicit reasoning-path distillation—and achieves significant performance gains for GPT-2 on mathematical and logical reasoning benchmarks. Notably, its reasoning distillation efficacy surpasses current state-of-the-art methods.

Technology Category

Application Category

📝 Abstract
The rapid advancement of large language models (LLMs) has significantly enhanced their reasoning abilities, enabling increasingly complex tasks. However, these capabilities often diminish in smaller, more computationally efficient models like GPT-2. Recent research shows that reasoning distillation can help small models acquire reasoning capabilities, but most existing methods focus primarily on improving teacher-generated reasoning paths. Our observations reveal that small models can generate high-quality reasoning paths during sampling, even without chain-of-thought prompting, though these paths are often latent due to their low probability under standard decoding strategies. To address this, we propose Self-Enhanced Reasoning Training (SERT), which activates and leverages latent reasoning capabilities in small models through self-training on filtered, self-generated reasoning paths under zero-shot conditions. Experiments using OpenAI's GPT-3.5 as the teacher model and GPT-2 models as the student models demonstrate that SERT enhances the reasoning abilities of small models, improving their performance in reasoning distillation.
Problem

Research questions and friction points this paper is trying to address.

Enhance reasoning in small models
Activate latent reasoning paths
Improve reasoning distillation efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-Enhanced Reasoning Training
Activates latent reasoning capabilities
Zero-shot self-training on filtered paths
🔎 Similar Papers
No similar papers found.
Y
Yong Zhang
Ping An Technology (Shenzhen) Co., Ltd., China
B
Bingyuan Zhang
Ping An Technology (Shenzhen) Co., Ltd., China
Z
Zhitao Li
Ping An Technology (Shenzhen) Co., Ltd., China
M
Ming Li
Ping An Technology (Shenzhen) Co., Ltd., China
Ning Cheng
Ning Cheng
TeraHop
M
Minchuan Chen
Ping An Technology (Shenzhen) Co., Ltd., China
T
Tao Wei
Ping An Technology (Shenzhen) Co., Ltd., China
J
Jun Ma
Ping An Technology (Shenzhen) Co., Ltd., China
Shaojun Wang
Shaojun Wang
Soochow University, TU/e, University of Strasbourg
NanophotonicsLight-matter interactionsNanofabrication
J
Jing Xiao
Ping An Technology (Shenzhen) Co., Ltd., China