Not All Adapters Matter: Selective Adapter Freezing for Memory-Efficient Fine-Tuning of Language Models

📅 2024-11-26

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

158K/year

🤖 AI Summary

Existing adapter-based fine-tuning methods, though parameter-efficient, suffer from substantial memory and computational overhead, prolonged training time, and imbalanced adapter contributions. This work is the first to empirically reveal the heterogeneous contribution of adapters within Transformer architectures. To address these issues, we propose a selective freezing mechanism: (i) dynamically assessing adapter importance via gradient sensitivity and task-specific contribution; (ii) implementing a staged, progressive freezing strategy; and (iii) incorporating implicit regularization to smooth the loss landscape and improve generalization. Our method maintains or even improves downstream task performance while significantly reducing memory consumption (−42.85%), FLOPs (−34.59%), and training time (−11.82%). The approach achieves superior efficiency without compromising robustness or accuracy, offering a principled and practical solution for resource-constrained adapter tuning.

Technology Category

Application Category

📝 Abstract

Transformer-based large-scale pre-trained models achieve great success. Fine-tuning is the standard practice for leveraging these models in downstream tasks. Among the fine-tuning methods, adapter-tuning provides a parameter-efficient fine-tuning by introducing lightweight trainable modules while keeping most pre-trained parameters frozen. However, existing adapter-tuning methods still impose substantial resource usage. Through our investigation, we show that each adapter unequally contributes to both task performance and resource usage. Motivated by this insight, we propose Selective Adapter FrEezing (SAFE), which gradually freezes less important adapters early to reduce unnecessary resource usage while maintaining performance. In our experiments, SAFE reduces memory usage, computation amount, and training time by 42.85%, 34.59%, and 11.82%, respectively, while achieving comparable or better task performance compared to the baseline. We also demonstrate that SAFE induces regularization effect, thereby smoothing the loss landscape, which enables the model to generalize better by avoiding sharp minima.

Problem

Research questions and friction points this paper is trying to address.

Selectively freeze adapters to reduce resource usage

Maintain performance while optimizing memory and computation

Improve generalization by smoothing loss landscape

Innovation

Methods, ideas, or system contributions that make the work stand out.

Selective Adapter Freezing (SAFE) reduces resource usage

Gradually freezes less important adapters early

Maintains performance while saving memory and computation

🔎 Similar Papers

No similar papers found.