Long Exposure: Accelerating Parameter-Efficient Fine-Tuning for LLMs under Shadowy Sparsity

📅 2025-10-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Parameter-efficient fine-tuning (PEFT) of large language models (LLMs) suffers from computational inefficiency due to “shadowy sparsity”—a dynamic, fine-grained, and unmodeled sparsity pattern emerging during training. Method: This paper introduces Long Exposure, the first system explicitly designed to address shadowy sparsity. It proposes a novel sparse-aware long-range exposure mechanism, a serialized sparse predictor, and dynamic-aware operators supporting merged memory access. Contribution/Results: Long Exposure achieves up to 2.49× end-to-end speedup in PEFT while preserving fine-tuning accuracy—outperforming state-of-the-art methods—and significantly reduces the computational cost of LLM adaptation.

Technology Category

Application Category

📝 Abstract
The adaptation of pre-trained large language models (LLMs) to diverse downstream tasks via fine-tuning is critical for numerous applications. However, the inefficiency of parameter-efficient fine-tuning (PEFT) techniques presents significant challenges in terms of time investments and operational costs. In this paper, we first introduce a nuanced form of sparsity, termed Shadowy Sparsity, which is distinctive in fine-tuning and has not been adequately addressed for acceleration. Under Shadowy Sparsity, we propose Long Exposure, an efficient system to accelerate PEFT for LLMs. Long Exposure comprises three key components: Shadowy-sparsity Exposer employs a prolonged sensing range to capture more sparsity details under shadowy sparsity; Sequence-oriented Predictor provides efficient yet accurate predictions to handle large sequence inputs and constantly-evolving parameters; and Dynamic-aware Operator facilitates more structured computational patterns and coalesced memory accesses, addressing dynamic sparse operations. Extensive evaluations show that Long Exposure outperforms state-of-the-arts with up to a $2.49 imes$ speedup in end-to-end fine-tuning, offering promising advancements in accelerating PEFT for LLMs.
Problem

Research questions and friction points this paper is trying to address.

Accelerating parameter-efficient fine-tuning for large language models
Addressing shadowy sparsity inefficiencies in fine-tuning processes
Optimizing computational patterns for dynamic sparse operations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Employs prolonged sensing to expose shadowy sparsity details
Uses sequence-oriented predictor for efficient input handling
Implements dynamic-aware operator for structured sparse computations
🔎 Similar Papers
No similar papers found.
T
Tuowei Wang
Microsoft Research, Beijing, China
K
Kun Li
Microsoft Research, Beijing, China
Z
Zixu Hao
Microsoft Research, Beijing, China
D
Donglin Bai
Microsoft Research, Beijing, China
Ju Ren
Ju Ren
Department of Computer Science and Technology, Tsinghua University
Internet-of-ThingsEdge Computing/IntelligenceSecurity and Privacy
Y
Yaoxue Zhang
Tsinghua University, Beijing, China
T
Ting Cao
Microsoft Research, Beijing, China
M
Mao Yang
Microsoft Research, Beijing, China