Sparsity May Be All You Need: Sparse Random Parameter Adaptation

📅 2025-02-21

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

To address the prohibitive computational and memory overhead of full-parameter fine-tuning for large language models (LLMs), this paper proposes Sparse Random Adapter (SRA): a parameter-efficient fine-tuning (PEFT) method that randomly freezes the vast majority of model parameters and applies standard backpropagation updates to only a tiny fraction (e.g., 0.1%–0.5%) of weights—without low-rank decomposition, adapter modules, or auxiliary parameters. Crucially, we identify sparsity itself as the core mechanism enabling efficient PEFT. Extensive experiments demonstrate that SRA matches or surpasses LoRA in performance across diverse alignment and downstream tasks, while reducing GPU memory consumption by 40% and FLOPs by 35%. To our knowledge, this is the first work to rigorously validate purely random sparse weight updates as a lightweight, general-purpose, and highly effective PEFT paradigm.

Technology Category

Application Category

📝 Abstract

Full fine-tuning of large language models for alignment and task adaptation has become prohibitively expensive as models have grown in size. Parameter-Efficient Fine-Tuning (PEFT) methods aim at significantly reducing the computational and memory resources needed for fine-tuning these models by only training on a small number of parameters instead of all model parameters. Currently, the most popular PEFT method is the Low-Rank Adaptation (LoRA), which freezes the parameters of the model to be fine-tuned and introduces a small set of trainable parameters in the form of low-rank matrices. We propose simply reducing the number of trainable parameters by randomly selecting a small proportion of the model parameters to train on. In this paper, we compare the efficiency and performance of our proposed approach with PEFT methods, including LoRA, as well as full parameter fine-tuning.

Problem

Research questions and friction points this paper is trying to address.

Reducing computational costs

Efficient model fine-tuning

Sparse parameter training

Innovation

Methods, ideas, or system contributions that make the work stand out.

Sparse random parameter selection

Reduces trainable parameters count

Compares with LoRA efficiency

🔎 Similar Papers

No similar papers found.