Learning to Reason in 13 Parameters

📅 2026-02-04

📈 Citations: 0

✨ Influential: 0

career value

186K/year

🤖 AI Summary

This work investigates the feasibility of training large language models for efficient reasoning with an extremely minimal number of trainable parameters. The authors propose TinyLoRA, an ultra-low-rank adapter that introduces only 13 trainable parameters (26 bytes), and apply it to fine-tune the 8B-parameter Qwen2.5 model within a reinforcement learning framework. This study demonstrates for the first time that such an exceptionally sparse parameterization can effectively acquire complex reasoning capabilities, thereby transcending the dimensional constraints of conventional low-rank adaptation and highlighting the pivotal role of reinforcement learning. The method achieves 91% accuracy on GSM8K and recovers 90% of the performance gain on challenging benchmarks—including AIME, AMC, and MATH500—using merely one-thousandth of the parameters required by standard approaches, significantly outperforming supervised fine-tuning.

Technology Category

Application Category

📝 Abstract

Recent research has shown that language models can learn to \textit{reason}, often via reinforcement learning. Some work even trains low-rank parameterizations for reasoning, but conventional LoRA cannot scale below the model dimension. We question whether even rank=1 LoRA is necessary for learning to reason and propose TinyLoRA, a method for scaling low-rank adapters to sizes as small as one parameter. Within our new parameterization, we are able to train the 8B parameter size of Qwen2.5 to 91\% accuracy on GSM8K with only 13 trained parameters in bf16 (26 total bytes). We find this trend holds in general: we are able to recover 90\% of performance improvements while training $1000x$ fewer parameters across a suite of more difficult learning-to-reason benchmarks such as AIME, AMC, and MATH500. Notably, we are only able to achieve such strong performance with RL: models trained using SFT require $100-1000x$ larger updates to reach the same performance.

Problem

Research questions and friction points this paper is trying to address.

reasoning

parameter efficiency

low-rank adaptation

reinforcement learning

large language models

Innovation

Methods, ideas, or system contributions that make the work stand out.

TinyLoRA

low-rank adaptation

parameter-efficient learning