Memory-Efficient LLM Training by Various-Grained Low-Rank Projection of Gradients

📅 2025-05-03

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

To address memory constraints and inflexible gradient projection granularity in efficient fine-tuning of large language models (LLMs), this paper proposes the Variable-granularity Low-rank Projection (VLoRP) framework. Unlike existing Low-rank Projection (LoRP) methods that fix each row of the gradient matrix as the projection unit, VLoRP systematically investigates how projection granularity—ranging from row-wise, block-wise, to full-matrix—impacts the memory–performance trade-off, introducing “granularity” as a novel optimization degree of freedom beyond conventional rank tuning. We further design ProjFactor, an adaptive optimizer supporting gradient accumulation, and provide theoretical convergence guarantees for SGD under VLoRP. Experiments on CommonsenseQA, MMLU, and GSM8K demonstrate that, under identical GPU memory budgets, VLoRP achieves more stable convergence and superior final accuracy compared to LoRA and LoRP, while reducing peak memory consumption by up to 42%.

Technology Category

Application Category

📝 Abstract

Building upon the success of low-rank adapter (LoRA), low-rank gradient projection (LoRP) has emerged as a promising solution for memory-efficient fine-tuning. However, existing LoRP methods typically treat each row of the gradient matrix as the default projection unit, leaving the role of projection granularity underexplored. In this work, we propose a novel framework, VLoRP, that extends low-rank gradient projection by introducing an additional degree of freedom for controlling the trade-off between memory efficiency and performance, beyond the rank hyper-parameter. Through this framework, we systematically explore the impact of projection granularity, demonstrating that finer-grained projections lead to enhanced stability and efficiency even under a fixed memory budget. Regarding the optimization for VLoRP, we present ProjFactor, an adaptive memory-efficient optimizer, that significantly reduces memory requirement while ensuring competitive performance, even in the presence of gradient accumulation. Additionally, we provide a theoretical analysis of VLoRP, demonstrating the descent and convergence of its optimization trajectory under both SGD and ProjFactor. Extensive experiments are conducted to validate our findings, covering tasks such as commonsense reasoning, MMLU, and GSM8K.

Problem

Research questions and friction points this paper is trying to address.

Explores impact of gradient projection granularity on efficiency

Proposes VLoRP for memory-efficient fine-tuning with flexible trade-offs

Develops ProjFactor optimizer to reduce memory usage effectively

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces VLoRP for variable-grained low-rank gradient projection

Proposes ProjFactor for adaptive memory-efficient optimization

Analyzes VLoRP's convergence under SGD and ProjFactor

🔎 Similar Papers

No similar papers found.

ByteDance

United States / China / Singapore

Authors to Follow