Memory-Efficient Fine-Tuning via Low-Rank Activation Compression

📅 2025-09-27

📈 Citations: 0

✨ Influential: 0

career value

210K/year

🤖 AI Summary

In large language model fine-tuning, activation memory overhead—particularly under large batch sizes and long contexts—has become a critical deployment bottleneck. This paper proposes LoRAct, the first online low-rank compression method leveraging the intrinsic low-rank structure of activations: it requires no calibration data, achieves efficient online compression via sampling-based orthogonal decomposition, and provides a theoretically tight error bound. Integrating randomized SVD optimization with low-rank matrix decomposition, LoRAct significantly improves computational efficiency. Evaluated on multimodal tasks, LoRAct reduces activation memory by approximately 80% compared to LoRA, with negligible degradation in model performance. By enabling memory-efficient, calibration-free, and theoretically grounded activation compression, LoRAct establishes a novel paradigm for scalable and practical fine-tuning.

Technology Category

Application Category

📝 Abstract

The parameter-efficient fine-tuning paradigm has garnered significant attention with the advancement of foundation models. Although numerous methods have been proposed to reduce the number of trainable parameters, their substantial memory overhead remains a critical bottleneck that hinders practical deployment. In this paper, we observe that model activations constitute a major source of memory consumption, especially under large batch sizes and long context lengths; however, the rank of the activations remains consistently low. Motivated by this insight, we propose a memory-efficient fine-tuning approach Low-Rank Activation Compression (LoRAct). Unlike prior work, LoRAct provides a more flexible and versatile compressing strategy that can be applied online during the forward pass without the need for any calibration data. Moreover, LoRAct incorporates a novel sampling-based orthogonal decomposition algorithm specifically designed for low-rank matrices, offering improved computational efficiency and a tighter error bound compared to the widely used RSVD. Experiments on both vision and language tasks demonstrate the effectiveness of LoRAct. Notably, LoRAct further reduces activation memory by approximately 80% in comparison with the widely adopted LoRA method, while maintaining competitive performance. The source code is available at https://github.com/shijxcs/meft.

Problem

Research questions and friction points this paper is trying to address.

Reducing activation memory consumption in fine-tuning

Compressing low-rank activations without calibration data

Maintaining performance while cutting memory by 80%

Innovation

Methods, ideas, or system contributions that make the work stand out.

Compresses activations via low-rank approximation

Uses online compression without calibration data

Employs novel sampling-based orthogonal decomposition algorithm

🔎 Similar Papers

Unified Framework for Neural Network Compression via Decomposition and Optimal Rank Selection