GaLLoP: Gradient-based Sparse Learning on Low-Magnitude Parameters

๐Ÿ“… 2025-10-22
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Existing sparse fine-tuning methods for large language models suffer from suboptimal parameter selection, leading to catastrophic forgetting and degraded generalization. To address this, we propose a gradient-magnitude joint selection mechanism that dynamically identifies parameter subsets exhibiting large task-specific gradients yet small pretraining magnitudes for updateโ€”thereby balancing downstream adaptation with pretrained knowledge retention. This strategy mitigates risks of overfitting to training data and distributional shift. We conduct systematic evaluations on LLaMA-3 8B and Gemma-2B, integrating our method with established sparse adaptation baselines including LoRA, DoRA, and SAFT. Results demonstrate that our approach matches or exceeds state-of-the-art sparse fine-tuning methods across multiple downstream tasks, achieves superior in-distribution and out-of-distribution generalization, and exhibits exceptional robustness across random seeds.

Technology Category

Application Category

๐Ÿ“ Abstract
Sparse fine-tuning techniques adapt LLMs to downstream tasks by only tuning a sparse subset of model parameters. However, the effectiveness of sparse adaptation depends on optimally selecting the model parameters to be fine-tuned. In this work, we introduce a novel sparse fine-tuning technique named GaLLoP: Gradient-based Sparse Learning on Low-Magnitude Parameters, which fine-tunes only those model parameters which have the largest gradient magnitudes on downstream tasks and the smallest pre-trained magnitudes, intuitively prioritizing parameters that are highly task-relevant, but minimally disruptive to pre-trained knowledge. Our experimentation with LLaMA3 8B and Gemma 2B as base models shows that GaLLoP consistently improves or matches the in-distribution as well as out-of-distribution performance obtained via the usage of other leading parameter-efficient fine-tuning techniques, including LoRA, DoRA, and SAFT. Our analysis demonstrates that GaLLoP mitigates catastrophic forgetting and memorization of task data, as important pre-trained parameters remain unchanged, and stabilizes performance relative to other fine-tuning techniques, robustly generalizing across most random seeds.
Problem

Research questions and friction points this paper is trying to address.

Optimally selecting parameters for sparse fine-tuning
Mitigating catastrophic forgetting during model adaptation
Improving generalization across distribution shifts
Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-tunes parameters with large gradients and small magnitudes
Prioritizes task-relevant yet minimally disruptive parameters
Mitigates catastrophic forgetting and stabilizes performance
๐Ÿ”Ž Similar Papers
No similar papers found.
A
Anand Choudhary
Sony Europe Ltd., Stuttgart Technology Center, EUREC; EPFL, Switzerland
Y
Yasser Sulaiman
Sony Europe Ltd., Stuttgart Technology Center, EUREC; University of Stuttgart, Germany
Lukas Mauch
Lukas Mauch
Sony Europe B.V.
machine learningsignal processing
G
G. B. Hacene
Sony Europe Ltd., Stuttgart Technology Center, EUREC
F
Fabien Cardinaux
Sony Europe Ltd., Stuttgart Technology Center, EUREC
Antoine Bosselut
Antoine Bosselut
EPFL
Natural Language ProcessingMachine LearningCommonsense Representation and Reasoning