GaLLoP: Gradient-based Sparse Learning on Low-Magnitude Parameters

📅 2025-10-22

📈 Citations: 0

✨ Influential: 0

career value

215K/year

🤖 AI Summary

Existing sparse fine-tuning methods for large language models suffer from suboptimal parameter selection, leading to catastrophic forgetting and degraded generalization. To address this, we propose a gradient-magnitude joint selection mechanism that dynamically identifies parameter subsets exhibiting large task-specific gradients yet small pretraining magnitudes for update—thereby balancing downstream adaptation with pretrained knowledge retention. This strategy mitigates risks of overfitting to training data and distributional shift. We conduct systematic evaluations on LLaMA-3 8B and Gemma-2B, integrating our method with established sparse adaptation baselines including LoRA, DoRA, and SAFT. Results demonstrate that our approach matches or exceeds state-of-the-art sparse fine-tuning methods across multiple downstream tasks, achieves superior in-distribution and out-of-distribution generalization, and exhibits exceptional robustness across random seeds.

Technology Category

Application Category

📝 Abstract

Sparse fine-tuning techniques adapt LLMs to downstream tasks by only tuning a sparse subset of model parameters. However, the effectiveness of sparse adaptation depends on optimally selecting the model parameters to be fine-tuned. In this work, we introduce a novel sparse fine-tuning technique named GaLLoP: Gradient-based Sparse Learning on Low-Magnitude Parameters, which fine-tunes only those model parameters which have the largest gradient magnitudes on downstream tasks and the smallest pre-trained magnitudes, intuitively prioritizing parameters that are highly task-relevant, but minimally disruptive to pre-trained knowledge. Our experimentation with LLaMA3 8B and Gemma 2B as base models shows that GaLLoP consistently improves or matches the in-distribution as well as out-of-distribution performance obtained via the usage of other leading parameter-efficient fine-tuning techniques, including LoRA, DoRA, and SAFT. Our analysis demonstrates that GaLLoP mitigates catastrophic forgetting and memorization of task data, as important pre-trained parameters remain unchanged, and stabilizes performance relative to other fine-tuning techniques, robustly generalizing across most random seeds.

Problem

Research questions and friction points this paper is trying to address.

Optimally selecting parameters for sparse fine-tuning

Mitigating catastrophic forgetting during model adaptation

Improving generalization across distribution shifts

Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-tunes parameters with large gradients and small magnitudes

Prioritizes task-relevant yet minimally disruptive parameters

Mitigates catastrophic forgetting and stabilizes performance

🔎 Similar Papers

No similar papers found.