DLP-LoRA: Efficient Task-Specific LoRA Fusion with a Dynamic, Lightweight Plugin for Large Language Models

๐Ÿ“… 2024-10-02
๐Ÿ›๏ธ arXiv.org
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
To address context-agnostic behavior and inefficient inference caused by static LoRA adapter fusion in multi-task settings, this paper proposes a sentence-level dynamic LoRA fusion mechanism. Unlike conventional token-level approaches, our method introduces a lightweight mini-MLP gating module (5M parameters) combined with top-p sampling to dynamically weight and fuse multiple LoRA adapters at the sentence levelโ€”enabling context-aware adaptation and parallelizable inference. The approach is fully compatible with the PEFT framework and requires no modification to the base model. Evaluated on 26 multi-task benchmarks, it achieves an average accuracy of 92.34% on multiple-choice tasks and significant improvements in BLEU and ROUGE scores, while keeping inference latency within twice that of a single LoRA. Its core contribution is the first lightweight, plug-and-play architecture supporting sentence-granular, context-driven, low-overhead dynamic LoRA fusion.

Technology Category

Application Category

๐Ÿ“ Abstract
Recent advancements in Large Language Models (LLMs) have achieved robust performance across diverse tasks, but fine-tuning these models for specific domains remains resource-intensive. Parameter-Efficient Fine-Tuning (PEFT) methods like Low-Rank Adaptation (LoRA) address this challenge by fine-tuning a small subset of parameters. However, existing methods for fusing multiple LoRAs lack dynamic fusion based on contextual inputs and often increase inference time due to token-level operations. We propose DLP-LoRA, a Dynamic Lightweight Plugin that employs a mini-MLP module with only 5M parameters to dynamically fuse multiple LoRAs at the sentence level using top-p sampling strategies. This approach reduces inference time to less than twice that of single LoRA inference by leveraging parallel computation. Evaluations across 26 tasks-including multiple-choice questions and question answering-demonstrate that DLP-LoRA achieves an average accuracy of 92.34% on multiple-choice datasets and significant improvements in BLEU and ROUGE scores on QA datasets, outperforming different LLMs backbones under composite task settings. DLP-LoRA effectively balances performance and efficiency, making it a practical solution for dynamic multi-task adaptation in LLMs. Our code is available at https://github.com/MeCuping/DLP-LoRA.
Problem

Research questions and friction points this paper is trying to address.

Dynamic fusion of multiple LoRAs
Reduction of inference time
Efficient multi-task adaptation in LLMs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic lightweight plugin
Sentence-level LoRA fusion
Mini-MLP with 5M parameters
๐Ÿ”Ž Similar Papers
No similar papers found.
Y
Yuxuan Zhang
Department of Computing Science, University of Aberdeen; Aberdeen Institute of Data Science and Artificial Intelligence, South China Normal University
R
Ruizhe Li
Department of Computing Science, University of Aberdeen