🤖 AI Summary
This work investigates whether preserving model integrity is a necessary condition for performance improvement in large language model (LLM) fine-tuning. To address this, we propose Mask Fine-Tuning (MFT), a parameter-efficient method that learns binary masks end-to-end to selectively activate subsets of frozen pretrained weights—introducing zero new parameters and requiring no architectural modifications. We provide the first theoretical and empirical evidence that model integrity is *not* essential for effective fine-tuning, thereby repositioning mask learning from a compression technique to a general-purpose performance enhancement paradigm. Extensive experiments across multiple tasks—including code generation—demonstrate that MFT consistently improves accuracy by +1.95% on LLaMA2-7B and +1.88% on LLaMA3.1-8B, with strong stability, plug-and-play deployment, cross-task generalizability, and robustness to hyperparameter variation.
📝 Abstract
The model is usually kept integral in the mainstream large language model (LLM) fine-tuning protocols. No works have questioned whether maintaining the integrity of the model is indispensable for performance. In this work, we introduce Mask Fine-Tuning (MFT), a brand-new LLM fine-tuning paradigm to show that properly breaking the integrity of the model can surprisingly lead to improved performance. Specifically, MFT learns a set of binary masks supervised by the typical LLM fine-tuning objective. Extensive experiments show that MFT gains a consistent performance boost across various domains and backbones (e.g., 1.95%/1.88% average gain in coding with LLaMA2-7B/3.1-8B). Detailed procedures are provided to study the proposed MFT from different hyperparameter perspectives for better insight. In particular, MFT naturally updates the current LLM training protocol by deploying it on a complete well-trained model. This study extends the functionality of mask learning from its conventional network pruning context for model compression to a more general scope.