LIFT the Veil for the Truth: Principal Weights Emerge after Rank Reduction for Reasoning-Focused Supervised Fine-Tuning

📅 2025-06-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Full-parameter fine-tuning of large language models (LLMs) incurs prohibitive computational costs, risks overfitting and catastrophic forgetting, while existing sparse fine-tuning methods lack precision in identifying inference-critical parameters. Method: We propose low-rank-guided sparse fine-tuning, which leverages singular value decomposition (SVD) or LoRA-style low-rank initialization and dynamically selects the top 5% parameters with largest magnitudes—termed “principal weights”—for updating. Crucially, we empirically discover that these principal weights, identified after low-rank approximation, are most critical for inference; moreover, we reveal an effectiveness reversal of magnitude-based pruning in the low-rank subspace. Results: Our method outperforms full fine-tuning on arithmetic reasoning tasks, achieves memory efficiency comparable to LoRA, and improves source-domain knowledge retention by ~20% over both full fine-tuning and LoRA—significantly advancing sparse fine-tuning performance on LLM inference tasks.

Technology Category

Application Category

📝 Abstract
Recent studies have shown that supervised fine-tuning of LLMs on a small number of high-quality datasets can yield strong reasoning capabilities. However, full fine-tuning (Full FT), while powerful, is computationally expensive and susceptible to overfitting and catastrophic forgetting, particularly when data is limited. Sparse fine-tuning, which previously achieved notable success by updating only a small subset of model parameters, offers a promising trade-off between efficiency and effectiveness. Yet, it has lagged behind in the LLM era due to the difficulty of identifying parameters truly critical for reasoning. In this work, we state that weights with the largest magnitude after low-rank approximation are critical weights for fine-tuning, which we call Principal Weights. Surprisingly, while magnitude-based sparse fine-tuning performs poorly as a baseline on LLM fine-tuning, it becomes highly effective after rank reduction. These insights motivate our method: Low-rank Informed Sparse Fine-Tuning (LIFT). LIFT only updates the top 5% Principal Weights throughout training and consistently achieves better performance on reasoning tasks than Full FT, while maintaining memory efficiency on par with popular parameter-efficient fine-tuning methods. In addition to strong performance on target domains such as arithmetic reasoning, LIFT also retains up to 20% more source-domain knowledge, compared to Full FT and LoRA. Our code is available at: https://github.com/zihanghliu/LIFT.
Problem

Research questions and friction points this paper is trying to address.

Identifies critical weights for efficient LLM fine-tuning
Improves reasoning performance with sparse parameter updates
Reduces memory usage while retaining source-domain knowledge
Innovation

Methods, ideas, or system contributions that make the work stand out.

Low-rank approximation identifies Principal Weights
Sparse fine-tuning updates top 5% Principal Weights
LIFT method balances efficiency and reasoning performance
Z
Zihang Liu
University of California, Berkeley, CA, USA
T
Tianyu Pang
Dartmouth College, NH, USA
Oleg Balabanov
Oleg Balabanov
UC Berkeley and ICSI
Numerical AnalysisMachine LearningRandomized Linear AlgebraModel Order Reduction
C
Chaoqun Yang
Tsinghua University, China
Tianjin Huang
Tianjin Huang
Asst. Professor, CS@University of Exeter & Researcher Fellow, CS@TU/e
LLMsAdversarial examplesStable TrainingGraph Neural NetworkSparse Training
L
Lu Yin
University of Surrey, Guildford, UK; Eindhoven University of Technology, the Netherlands
Yaoqing Yang
Yaoqing Yang
Assistant Professor@Dartmouth CS
machine learning model diagnosticsstructured datainformation theory
S
Shiwei Liu
University of Oxford, Oxford, UK