MLorc: Momentum Low-rank Compression for Large Language Model Adaptation

📅 2025-06-02

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

To address the prohibitive memory overhead of full-parameter fine-tuning for large language models (LLMs), this paper proposes Momentum Low-Rank Compression (MLorc), a novel training paradigm that directly applies low-rank compression to optimizer momentum—rather than gradients or weight updates—for the first time. MLorc introduces a dynamic rank selection mechanism to preserve the update dynamics of full fine-tuning and provides theoretical convergence guarantees under standard assumptions. It is compatible with generic optimizers such as SGD and Adam. Experiments demonstrate that, at rank $r = 4$, MLorc matches or surpasses full-parameter fine-tuning across multiple LLMs and downstream tasks, while achieving memory and computational efficiency comparable to LoRA and GaLore. Its strong generalization across architectures and tasks underscores its robustness. The core innovation lies in adaptive low-rank compression applied explicitly to momentum, effectively balancing training efficiency and optimization fidelity.

Technology Category

Application Category

📝 Abstract

With increasing size of large language models (LLMs), full-parameter fine-tuning imposes substantial memory demands. To alleviate this, we propose a novel memory-efficient training paradigm called Momentum Low-rank compression (MLorc). By directly compressing and reconstructing momentum rather than gradients, MLorc avoids imposing a fixed-rank constraint on weight update matrices and better preserves the training dynamics of full-parameter fine-tuning, in contrast to existing low-rank approaches such as LoRA and GaLore. Empirically, MLorc consistently outperforms other memory-efficient training methods, matches or even exceeds the performance of full fine-tuning with a small rank (e.g., $r=4$), and generalizes well across different optimizers -- all while not compromising time or memory efficiency. Furthermore, we provide a theoretical guarantee for its convergence under reasonable assumptions.

Problem

Research questions and friction points this paper is trying to address.

Reduces memory demands in large language model fine-tuning

Avoids fixed-rank constraints on weight updates

Maintains performance while improving memory efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

Compresses and reconstructs momentum directly

Avoids fixed-rank constraint on weight updates

Matches full fine-tuning with small rank

🔎 Similar Papers

Adaptive Feature-based Low-Rank Compression of Large Language Models via Bayesian Optimization