Sparse Layer Sharpness-Aware Minimization for Efficient Fine-Tuning

📅 2026-02-10

📈 Citations: 0

✨ Influential: 0

career value

239K/year

🤖 AI Summary

This work addresses the high computational cost of Sharpness-aware Minimization (SAM) during fine-tuning, which doubles the overhead due to its additional parameter perturbation step, thereby hindering efficient deployment. To mitigate this, the authors propose a sparsified SAM approach that dynamically selects a subset of model layers for perturbation and update at each iteration. This selection is formulated as a multi-armed bandit problem, where layers are adaptively sampled based on their gradient norms to identify the most critical ones. By integrating sparse activation with a gradient-aware layer selection strategy, the method achieves highly efficient optimization across multiple tasks. It not only attains state-of-the-art performance in large language model fine-tuning—ranking first—but also substantially reduces the proportion of activated parameters during backpropagation to 47%, 22%, and 21% in different settings.

Technology Category

Application Category

📝 Abstract

Sharpness-aware minimization (SAM) seeks the minima with a flat loss landscape to improve the generalization performance in machine learning tasks, including fine-tuning. However, its extra parameter perturbation step doubles the computation cost, which becomes the bottleneck of SAM in the practical implementation. In this work, we propose an approach SL-SAM to break this bottleneck by introducing the sparse technique to layers. Our key innovation is to frame the dynamic selection of layers for both the gradient ascent (perturbation) and descent (update) steps as a multi-armed bandit problem. At the beginning of each iteration, SL-SAM samples a part of the layers of the model according to the gradient norm to participate in the backpropagation of the following parameter perturbation and update steps, thereby reducing the computation complexity. We then provide the analysis to guarantee the convergence of SL-SAM. In the experiments of fine-tuning models in several tasks, SL-SAM achieves the performances comparable to the state-of-the-art baselines, including a \#1 rank on LLM fine-tuning. Meanwhile, SL-SAM significantly reduces the ratio of active parameters in backpropagation compared to vanilla SAM (SL-SAM activates 47\%, 22\% and 21\% parameters on the vision, moderate and large language model respectively while vanilla SAM always activates 100\%), verifying the efficiency of our proposed algorithm.

Problem

Research questions and friction points this paper is trying to address.

Sharpness-aware minimization

efficient fine-tuning

computational bottleneck

parameter perturbation

sparse optimization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Sparse Layer

Sharpness-Aware Minimization

Multi-Armed Bandit