Boosting Large Language Models with Mask Fine-Tuning

📅 2025-03-27

📈 Citations: 0

✨ Influential: 0

career value

215K/year

🤖 AI Summary

This work investigates whether preserving model integrity is a necessary condition for performance improvement in large language model (LLM) fine-tuning. To address this, we propose Mask Fine-Tuning (MFT), a parameter-efficient method that learns binary masks end-to-end to selectively activate subsets of frozen pretrained weights—introducing zero new parameters and requiring no architectural modifications. We provide the first theoretical and empirical evidence that model integrity is *not* essential for effective fine-tuning, thereby repositioning mask learning from a compression technique to a general-purpose performance enhancement paradigm. Extensive experiments across multiple tasks—including code generation—demonstrate that MFT consistently improves accuracy by +1.95% on LLaMA2-7B and +1.88% on LLaMA3.1-8B, with strong stability, plug-and-play deployment, cross-task generalizability, and robustness to hyperparameter variation.

Technology Category

Application Category

📝 Abstract

The model is usually kept integral in the mainstream large language model (LLM) fine-tuning protocols. No works have questioned whether maintaining the integrity of the model is indispensable for performance. In this work, we introduce Mask Fine-Tuning (MFT), a brand-new LLM fine-tuning paradigm to show that properly breaking the integrity of the model can surprisingly lead to improved performance. Specifically, MFT learns a set of binary masks supervised by the typical LLM fine-tuning objective. Extensive experiments show that MFT gains a consistent performance boost across various domains and backbones (e.g., 1.95%/1.88% average gain in coding with LLaMA2-7B/3.1-8B). Detailed procedures are provided to study the proposed MFT from different hyperparameter perspectives for better insight. In particular, MFT naturally updates the current LLM training protocol by deploying it on a complete well-trained model. This study extends the functionality of mask learning from its conventional network pruning context for model compression to a more general scope.

Problem

Research questions and friction points this paper is trying to address.

Challenges necessity of full model integrity in LLM fine-tuning

Proposes Mask Fine-Tuning to boost performance via partial updates

Extends mask learning from pruning to general LLM optimization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces Mask Fine-Tuning (MFT) for LLMs

Uses binary masks to break model integrity

Improves performance across domains and backbones

🔎 Similar Papers

Large Vocabulary Size Improves Large Language Models