๐ค AI Summary
To address the limited performance of 0.5B-parameter small reasoning language models (SRLMs) on complex tasks such as mathematical reasoning and code generation, this paper proposes the first multi-stage collaborative training paradigm specifically designed for small models. Our method systematically integrates supervised fine-tuning (SFT), knowledge distillation (KD), and reinforcement learning (RL), augmented by task-specific data construction and a unified evaluation protocol. Experiments demonstrate substantial improvements in reasoning capability: the resulting model achieves performance on benchmarks including GSM8K, MATH, and HumanEval that approaches that of 7B-parameter modelsโreaching over 90% of their scores on several metrics. This work validates the practical viability of lightweight models in efficiency-critical, resource-constrained, and privacy-sensitive applications. Moreover, it establishes a reproducible and scalable methodological framework for enhancing the reasoning capabilities of small language models.
๐ Abstract
The ongoing evolution of language models has led to the development of large-scale architectures that demonstrate exceptional performance across a wide range of tasks. However, these models come with significant computational and energy demands, as well as potential privacy implications. In this context, Small Reasoning Language Models (SRLMs) with approximately 0.5 billion parameters present a compelling alternative due to their remarkable computational efficiency and cost effectiveness, particularly in resource-constrained environments. Despite these advantages, the limited capacity of 0.5 billion parameter models poses challenges in handling complex tasks such as mathematical reasoning and code generation. This research investigates various training strategies, including supervised fine-tuning (SFT), knowledge distillation (KD), and reinforcement learning (RL), as well as their hybrid implementations, to enhance the performance of 0.5B SRLMs. We analyze effective methodologies to bridge the performance gap between SRLMS and larger models and present insights into optimal training pipelines tailored for these smaller architectures. Through extensive experimental validation and analysis, our work aims to provide actionable recommendations for maximizing the reasoning capabilities of 0.5B models.