Boosting Large Language Models with Mask Fine-Tuning

📅 2025-03-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates whether preserving model integrity is a necessary condition for performance improvement in large language model (LLM) fine-tuning. To address this, we propose Mask Fine-Tuning (MFT), a parameter-efficient method that learns binary masks end-to-end to selectively activate subsets of frozen pretrained weights—introducing zero new parameters and requiring no architectural modifications. We provide the first theoretical and empirical evidence that model integrity is *not* essential for effective fine-tuning, thereby repositioning mask learning from a compression technique to a general-purpose performance enhancement paradigm. Extensive experiments across multiple tasks—including code generation—demonstrate that MFT consistently improves accuracy by +1.95% on LLaMA2-7B and +1.88% on LLaMA3.1-8B, with strong stability, plug-and-play deployment, cross-task generalizability, and robustness to hyperparameter variation.

Technology Category

Application Category

📝 Abstract
The model is usually kept integral in the mainstream large language model (LLM) fine-tuning protocols. No works have questioned whether maintaining the integrity of the model is indispensable for performance. In this work, we introduce Mask Fine-Tuning (MFT), a brand-new LLM fine-tuning paradigm to show that properly breaking the integrity of the model can surprisingly lead to improved performance. Specifically, MFT learns a set of binary masks supervised by the typical LLM fine-tuning objective. Extensive experiments show that MFT gains a consistent performance boost across various domains and backbones (e.g., 1.95%/1.88% average gain in coding with LLaMA2-7B/3.1-8B). Detailed procedures are provided to study the proposed MFT from different hyperparameter perspectives for better insight. In particular, MFT naturally updates the current LLM training protocol by deploying it on a complete well-trained model. This study extends the functionality of mask learning from its conventional network pruning context for model compression to a more general scope.
Problem

Research questions and friction points this paper is trying to address.

Challenges necessity of full model integrity in LLM fine-tuning
Proposes Mask Fine-Tuning to boost performance via partial updates
Extends mask learning from pruning to general LLM optimization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces Mask Fine-Tuning (MFT) for LLMs
Uses binary masks to break model integrity
Improves performance across domains and backbones
🔎 Similar Papers
No similar papers found.
M
Mingyuan Zhang
Department of ECE, College of Engineering, Northeastern University, Boston, USA
Yue Bai
Yue Bai
Northwestern University, Northeastern University
Multi-modal learningSparse network trainingMask learning
H
Huan Wang
Department of ECE, College of Engineering, Northeastern University, Boston, USA
Y
Yizhou Wang
Department of ECE, College of Engineering, Northeastern University, Boston, USA
Q
Qihua Dong
Department of ECE, College of Engineering, Northeastern University, Boston, USA
Y
Yun Fu
Khoury College of Computer Science, Northeastern University, Boston, USA