SparseLoRA: Accelerating LLM Fine-Tuning with Contextual Sparsity

📅 2025-06-19

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

To address the high computational cost and limited acceleration potential in large language model (LLM) fine-tuning, this paper proposes an efficient fine-tuning method leveraging context-aware sparsity. Unlike parameter-efficient approaches such as QLoRA and DoRA—which solely reduce parameter count—our method introduces, for the first time, a training-agnostic lightweight SVD-based sparsity estimator. It integrates context-aware sparsification, SVD-driven dynamic weight selection, hierarchical sensitivity modeling, and adaptive sparsity control to dynamically skip redundant computations during both forward and backward passes. Experiments demonstrate up to a 2.2× reduction in FLOPs and a 1.6× speedup in actual training time, with zero accuracy degradation across diverse downstream tasks—including commonsense reasoning, arithmetic reasoning, code generation, and instruction following.

Technology Category

Application Category

📝 Abstract

Fine-tuning LLMs is both computationally and memory-intensive. While parameter-efficient fine-tuning methods, such as QLoRA and DoRA, reduce the number of trainable parameters and lower memory usage, they do not decrease computational cost. In some cases, they may even slow down fine-tuning. In this paper, we introduce SparseLoRA, a method that accelerates LLM fine-tuning through contextual sparsity. We propose a lightweight, training-free SVD sparsity estimator that dynamically selects a sparse subset of weights for loss and gradient computation. Also, we systematically analyze and address sensitivity across layers, tokens, and training steps. Our experimental results show that SparseLoRA reduces computational cost by up to 2.2 times and a measured speedup of up to 1.6 times while maintaining accuracy across various downstream tasks, including commonsense and arithmetic reasoning, code generation, and instruction following.

Problem

Research questions and friction points this paper is trying to address.

Reducing computational cost in LLM fine-tuning

Addressing sensitivity across layers and tokens

Maintaining accuracy while accelerating training

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic sparse weight selection via SVD estimator

Layer-token-step sensitivity analysis and optimization

Training acceleration with maintained task accuracy

🔎 Similar Papers

Sparse Matrix in Large Language Model Fine-tuning

2024-05-24arXiv.orgCitations: 3

SLoPe: Double-Pruned Sparse Plus Lazy Low-Rank Adapter Pretraining of LLMs

2024-05-25arXiv.orgCitations: 2

ByteDance

圣何塞

Authors to Follow