Step-by-Step Unmasking for Parameter-Efficient Fine-tuning of Large Language Models

📅 2024-08-26

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

193K/year

🤖 AI Summary

To address the trade-off between high computational cost in downstream fine-tuning of large language models (LLMs) and the performance limitations of parameter-efficient fine-tuning (PEFT), this paper proposes ID³, a dynamic progressive parameter unmasking method. ID³ introduces a novel online parameter assessment and adaptive sparsification strategy grounded in gradient-based importance estimation, dynamically unmasking parameters under budget constraints via an exploration-exploitation balance. It seamlessly integrates with mainstream PEFT modules—including LoRA and Adapter—without architectural modification. Evaluated across 15 natural language understanding and generation tasks, ID³ consistently outperforms fixed-mask PEFT baselines, reducing gradient update volume by 50% while maintaining robustness to neuron random initialization. The method thus achieves a favorable compromise between computational efficiency and generalization capability.

Technology Category

Application Category

📝 Abstract

Fine-tuning large language models (LLMs) on downstream tasks requires substantial computational resources. A class of parameter-efficient fine-tuning (PEFT) aims to mitigate these computational challenges by selectively fine-tuning only a small fraction of the model parameters. Although computationally efficient, these techniques often fail to match the performance of fully fine-tuned models, primarily due to inherent biases introduced during parameter selection. Traditional selective PEFT techniques use a fixed set of parameters based on a predefined budget (a process also known as unmasking), failing to capture parameter importance dynamically and often ending up exceeding the budget. We introduce $ ext{ID}^3$, a novel selective PEFT method that calculates parameter importance continually and dynamically unmasks parameters by balancing exploration and exploitation in parameter selection. Our empirical study on 15 tasks spanning natural language understanding and generative tasks demonstrates the effectiveness of our method compared to fixed-masking-based PEFT techniques. We analytically show that $ ext{ID}^3$ reduces the number of gradient updates by a factor of two, enhancing computational efficiency. $ ext{ID}^3$ is robust to random initialization of neurons and, therefore, can be seamlessly integrated into existing additive and reparametrization-based PEFT modules such as adapters and LoRA for dynamic sparsification.

Problem

Research questions and friction points this paper is trying to address.

Reduces computational cost of fine-tuning large language models

Improves performance over fixed-parameter selective PEFT methods

Dynamically selects parameters to balance exploration and exploitation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic parameter importance calculation

Balances exploration and exploitation

Reduces gradient updates significantly

🔎 Similar Papers

Enhancing Large Language Model Performance with Gradient-Based Parameter Selection