Tina: Tiny Reasoning Models via LoRA

📅 2025-04-22

📈 Citations: 0

✨ Influential: 0

career value

186K/year

🤖 AI Summary

How can the reasoning capabilities of small-scale language models be enhanced cost-effectively? Method: This paper introduces a parameter-efficient fine-tuning approach that integrates Low-Rank Adaptation (LoRA) with reinforcement learning (RL) for reasoning optimization. Applied to a 1.5B-parameter base model, it leverages structured reasoning formats to enable rapid adaptation. Contribution/Results: We present the first empirical validation of LoRA’s efficacy in RL-based reasoning fine-tuning. The method achieves a Pass@1 score of 43.33% on AIME24—surpassing prior state-of-the-art methods on the same base model—while reducing training cost to just $9. It delivers over 20% improvement in inference performance and cuts post-training computational cost by 260× compared to standard full-parameter RL fine-tuning. All trained models, source code, and evaluation pipelines are publicly released.

Technology Category

Application Category

📝 Abstract

How cost-effectively can strong reasoning abilities be achieved in language models? Driven by this fundamental question, we present Tina, a family of tiny reasoning models achieved with high cost-efficiency. Notably, Tina demonstrates that substantial reasoning performance can be developed using only minimal resources, by applying parameter-efficient updates during reinforcement learning (RL), using low-rank adaptation (LoRA), to an already tiny 1.5B parameter base model. This minimalist approach produces models that achieve reasoning performance which is competitive with, and sometimes surpasses, SOTA RL reasoning models built upon the same base model. Crucially, this is achieved at a tiny fraction of the computational post-training cost employed by existing SOTA models. In fact, the best Tina model achieves a>20% reasoning performance increase and 43.33% Pass@1 accuracy on AIME24, at only $9 USD post-training and evaluation cost (i.e., an estimated 260x cost reduction). Our work reveals the surprising effectiveness of efficient RL reasoning via LoRA. We validate this across multiple open-source reasoning datasets and various ablation settings starting with a single, fixed set of hyperparameters. Furthermore, we hypothesize that this effectiveness and efficiency stem from LoRA rapidly adapting the model to the structural format of reasoning rewarded by RL, while largely preserving the base model's underlying knowledge. In service of accessibility and open research, we fully open-source all code, training logs, and model weights &checkpoints.

Problem

Research questions and friction points this paper is trying to address.

Achieving strong reasoning in tiny models cost-effectively

Enhancing reasoning via LoRA with minimal resources

Validating efficient RL reasoning across multiple datasets

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses LoRA for efficient RL updates

Achieves SOTA performance with 1.5B model

Reduces cost by 260x via minimal resources

🔎 Similar Papers

Semantic Self-Consistency: Enhancing Language Model Reasoning via Semantic Weighting