A Technical Study into Small Reasoning Language Models

📅 2025-06-16

📈 Citations: 0

✨ Influential: 0

career value

206K/year

🤖 AI Summary

To address the limited performance of 0.5B-parameter small reasoning language models (SRLMs) on complex tasks such as mathematical reasoning and code generation, this paper proposes the first multi-stage collaborative training paradigm specifically designed for small models. Our method systematically integrates supervised fine-tuning (SFT), knowledge distillation (KD), and reinforcement learning (RL), augmented by task-specific data construction and a unified evaluation protocol. Experiments demonstrate substantial improvements in reasoning capability: the resulting model achieves performance on benchmarks including GSM8K, MATH, and HumanEval that approaches that of 7B-parameter models—reaching over 90% of their scores on several metrics. This work validates the practical viability of lightweight models in efficiency-critical, resource-constrained, and privacy-sensitive applications. Moreover, it establishes a reproducible and scalable methodological framework for enhancing the reasoning capabilities of small language models.

Technology Category

Application Category

📝 Abstract

The ongoing evolution of language models has led to the development of large-scale architectures that demonstrate exceptional performance across a wide range of tasks. However, these models come with significant computational and energy demands, as well as potential privacy implications. In this context, Small Reasoning Language Models (SRLMs) with approximately 0.5 billion parameters present a compelling alternative due to their remarkable computational efficiency and cost effectiveness, particularly in resource-constrained environments. Despite these advantages, the limited capacity of 0.5 billion parameter models poses challenges in handling complex tasks such as mathematical reasoning and code generation. This research investigates various training strategies, including supervised fine-tuning (SFT), knowledge distillation (KD), and reinforcement learning (RL), as well as their hybrid implementations, to enhance the performance of 0.5B SRLMs. We analyze effective methodologies to bridge the performance gap between SRLMS and larger models and present insights into optimal training pipelines tailored for these smaller architectures. Through extensive experimental validation and analysis, our work aims to provide actionable recommendations for maximizing the reasoning capabilities of 0.5B models.

Problem

Research questions and friction points this paper is trying to address.

Enhancing performance of small reasoning language models (0.5B parameters)

Bridging performance gap between small and large language models

Optimizing training strategies for resource-efficient reasoning tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Small Reasoning Language Models with 0.5B parameters

Hybrid training: SFT, knowledge distillation, reinforcement learning

Optimized pipelines for enhanced reasoning capabilities

🔎 Similar Papers

No similar papers found.