🤖 AI Summary
Full-parameter fine-tuning of large language models is computationally prohibitive, while existing parameter-efficient fine-tuning (PEFT) methods still require updating all Transformer layers, ignoring the heterogeneous contribution of individual layers. To address this, we propose Progtuning—a progressive fine-tuning framework featuring a novel, contribution-aware dynamic module selection mechanism. During training, Progtuning progressively freezes low-contribution Transformer blocks while selectively updating only high-contribution subsets, in staged fashion. Compatible with mainstream PEFT techniques, it reduces trainable parameters by 25% without sacrificing performance: it matches full fine-tuning accuracy across diverse downstream tasks. By integrating contribution-driven progressive learning, modular freezing, and hierarchical optimization, Progtuning significantly improves computational resource efficiency and adaptation flexibility.
📝 Abstract
Fine-tuning is a promising technique for leveraging Transformer-based language models in downstream tasks. As model sizes continue to grow, updating all model parameters becomes increasingly costly. Parameter-efficient fine-tuning methods effectively address this issue by selectively updating a small subset of parameters. However, fine-tuning and most existing parameter-efficient fine-tuning methods require updating the same number of parameters as the initial size, ignoring the unequal contribution across Transformer blocks and leading to extremely inefficient allocation of computing resources. In this paper, we propose Progtuning, the novel fine-tuning framework combined with progressive learning for Transformer-based language models. Specifically, Progtuning progressively reduces the number of updated transformer blocks based on the contribution. Remarkably, Progtuning optimizes resource allocation and reduces the number of updated parameters by approximately 25%, while still maintaining competitive performance. And it also exhibits high adaptability with parameter-efficient fine-tuning methods, demonstrating excellent performance across various adaptation scenarios.