Investigating ReLoRA: Effects on the Learning Dynamics of Small Language Models

📅 2025-09-16

📈 Citations: 0

✨ Influential: 0

career value

195K/year

🤖 AI Summary

This work systematically investigates, for the first time, the applicability of ReLoRA to pretraining small language models (SLMs), addressing the open question of whether low-rank adaptation methods can be directly transferred from fine-tuning to pretraining in resource-constrained settings. Combining ReLoRA with LoRA and other parameter-efficient fine-tuning techniques, the study conducts ablation experiments and learning dynamics analysis—including loss trajectories, perplexity, and grammar task performance—to evaluate efficacy under limited compute. Results show that ReLoRA consistently underperforms full-parameter pretraining across SLMs, with degradation worsening as model size increases. The root cause is identified as heightened rank deficiency in smaller models, which impedes low-rank updates from capturing the broad representational shifts required during pretraining. This work reveals an intrinsic limitation of low-rank methods in the pretraining phase and provides critical empirical evidence and theoretical caution for designing efficient pretraining strategies.

Technology Category

Application Category

📝 Abstract

Parameter-efficient methods such as LoRA have revolutionised the fine-tuning of LLMs. Still, their extension to pretraining via ReLoRA is less well understood, especially for small language models (SLMs), which offer lower computational and environmental costs. This work is the first systematic study of ReLoRA in SLMs (11M-66M parameters), evaluating both performance and learning dynamics. Through ablation experiments, we find that ReLoRA generally performs worse than standard training on loss, Paloma perplexity and BLiMP, with the gap widening for the larger models. Further analysis of the learning dynamics of the models indicates that ReLoRA reinforces the rank deficiencies found in smaller models. These results indicate that low-rank update strategies may not transfer easily to SLM pretraining, highlighting the need for more research in the low-compute regime.

Problem

Research questions and friction points this paper is trying to address.

Investigating ReLoRA effects on small language model learning dynamics

Evaluating ReLoRA performance versus standard training in SLMs

Analyzing rank deficiency reinforcement in small models with ReLoRA

Innovation

Methods, ideas, or system contributions that make the work stand out.

ReLoRA applied to small language models

Systematic study of learning dynamics

Low-rank update strategies for pretraining

🔎 Similar Papers

ShareLoRA: Parameter Efficient and Robust Large Language Model Fine-tuning via Shared Low-Rank Adaptation