🤖 AI Summary
Existing PEFT methods (e.g., LoRA) uniformly deploy adapters across all layers of LLMs, ignoring inter-layer contribution heterogeneity and task-specific rank requirements—leading to parameter redundancy and suboptimal efficiency. This paper proposes Sparse Low-Rank Experts Adapters (SLORA), a novel framework that introduces a Fisher-information-guided layer importance scoring mechanism, coupled with Bayesian optimization for task-aware automatic rank allocation. SLORA embeds LoRA modules into a Mixture-of-Experts (MoE) architecture, activating sparse low-rank experts only in critical layers. Evaluated across multiple models (Llama, Qwen) and benchmarks (Alpaca, MT-Bench), SLORA reduces trainable parameters by 37–52%, decreases GPU memory consumption by 41–49%, and cuts inference latency by 33%, while maintaining or even improving downstream task performance. The method significantly enhances the efficiency–accuracy trade-off, enabling resource-efficient adaptation for constrained deployment scenarios.
📝 Abstract
Parameter-Efficient Fine-Tuning (PEFT) methods have emerged as a widely adopted strategy for adapting pre-trained Large Language Models (LLMs) to downstream tasks, significantly reducing memory and computational costs. However, most existing PEFT techniques uniformly deploy LoRA adapters across all layers, disregarding the intrinsic heterogeneity of layer contributions and task-specific rank requirements. This uniform paradigm leads to redundant parameter allocation and suboptimal adaptation efficiency. To address these limitations, we propose FLoE, a novel PEFT framework that introduces two key innovations: (i) a Fisher information-guided importance scoring mechanism to dynamically identify task-critical transformer layers for MoE-based low-rank adaptation, enabling sparse adapter deployment; and (ii) a Bayesian optimization-driven rank allocator that automatically determines optimal LoRA ranks on specific datasets without exhaustive grid search. Extensive experiments across diverse LLMs and benchmarks reveal that FLoE achieves impressive efficiency-accuracy trade-offs, making FLoE particularly advantageous in resource-constrained environments that necessitate rapid adaptation.