MLorc: Momentum Low-rank Compression for Large Language Model Adaptation

📅 2025-06-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the prohibitive memory overhead of full-parameter fine-tuning for large language models (LLMs), this paper proposes Momentum Low-Rank Compression (MLorc), a novel training paradigm that directly applies low-rank compression to optimizer momentum—rather than gradients or weight updates—for the first time. MLorc introduces a dynamic rank selection mechanism to preserve the update dynamics of full fine-tuning and provides theoretical convergence guarantees under standard assumptions. It is compatible with generic optimizers such as SGD and Adam. Experiments demonstrate that, at rank $r = 4$, MLorc matches or surpasses full-parameter fine-tuning across multiple LLMs and downstream tasks, while achieving memory and computational efficiency comparable to LoRA and GaLore. Its strong generalization across architectures and tasks underscores its robustness. The core innovation lies in adaptive low-rank compression applied explicitly to momentum, effectively balancing training efficiency and optimization fidelity.

Technology Category

Application Category

📝 Abstract
With increasing size of large language models (LLMs), full-parameter fine-tuning imposes substantial memory demands. To alleviate this, we propose a novel memory-efficient training paradigm called Momentum Low-rank compression (MLorc). By directly compressing and reconstructing momentum rather than gradients, MLorc avoids imposing a fixed-rank constraint on weight update matrices and better preserves the training dynamics of full-parameter fine-tuning, in contrast to existing low-rank approaches such as LoRA and GaLore. Empirically, MLorc consistently outperforms other memory-efficient training methods, matches or even exceeds the performance of full fine-tuning with a small rank (e.g., $r=4$), and generalizes well across different optimizers -- all while not compromising time or memory efficiency. Furthermore, we provide a theoretical guarantee for its convergence under reasonable assumptions.
Problem

Research questions and friction points this paper is trying to address.

Reduces memory demands in large language model fine-tuning
Avoids fixed-rank constraints on weight updates
Maintains performance while improving memory efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Compresses and reconstructs momentum directly
Avoids fixed-rank constraint on weight updates
Matches full fine-tuning with small rank
🔎 Similar Papers
No similar papers found.
W
Wei Shen
University of Virginia
Yaxiang Zhang
Yaxiang Zhang
National University of Singapore(NUS) Ph.D
Minhui Huang
Minhui Huang
Research Scientist
machine learningoptimization
M
Mengfan Xu
University of Massachusetts at Amherst
J
Jiawei Zhang
University of Wisconsin-Madison
C
Cong Shen
University of Virginia