FLoE: Fisher-Based Layer Selection for Efficient Sparse Adaptation of Low-Rank Experts

📅 2025-05-31

📈 Citations: 0

✨ Influential: 0

career value

170K/year

🤖 AI Summary

Existing PEFT methods (e.g., LoRA) uniformly deploy adapters across all layers of LLMs, ignoring inter-layer contribution heterogeneity and task-specific rank requirements—leading to parameter redundancy and suboptimal efficiency. This paper proposes Sparse Low-Rank Experts Adapters (SLORA), a novel framework that introduces a Fisher-information-guided layer importance scoring mechanism, coupled with Bayesian optimization for task-aware automatic rank allocation. SLORA embeds LoRA modules into a Mixture-of-Experts (MoE) architecture, activating sparse low-rank experts only in critical layers. Evaluated across multiple models (Llama, Qwen) and benchmarks (Alpaca, MT-Bench), SLORA reduces trainable parameters by 37–52%, decreases GPU memory consumption by 41–49%, and cuts inference latency by 33%, while maintaining or even improving downstream task performance. The method significantly enhances the efficiency–accuracy trade-off, enabling resource-efficient adaptation for constrained deployment scenarios.

Technology Category

Application Category

📝 Abstract

Parameter-Efficient Fine-Tuning (PEFT) methods have emerged as a widely adopted strategy for adapting pre-trained Large Language Models (LLMs) to downstream tasks, significantly reducing memory and computational costs. However, most existing PEFT techniques uniformly deploy LoRA adapters across all layers, disregarding the intrinsic heterogeneity of layer contributions and task-specific rank requirements. This uniform paradigm leads to redundant parameter allocation and suboptimal adaptation efficiency. To address these limitations, we propose FLoE, a novel PEFT framework that introduces two key innovations: (i) a Fisher information-guided importance scoring mechanism to dynamically identify task-critical transformer layers for MoE-based low-rank adaptation, enabling sparse adapter deployment; and (ii) a Bayesian optimization-driven rank allocator that automatically determines optimal LoRA ranks on specific datasets without exhaustive grid search. Extensive experiments across diverse LLMs and benchmarks reveal that FLoE achieves impressive efficiency-accuracy trade-offs, making FLoE particularly advantageous in resource-constrained environments that necessitate rapid adaptation.

Problem

Research questions and friction points this paper is trying to address.

Uniform LoRA adapters ignore layer heterogeneity and task-specific rank needs

Redundant parameter allocation reduces adaptation efficiency in PEFT methods

FLoE dynamically selects critical layers and optimizes ranks for efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

Fisher-guided layer selection for sparse adaptation

Bayesian optimization for optimal rank allocation

MoE-based low-rank adaptation for efficiency

🔎 Similar Papers

Low-Rank Interconnected Adaptation Across Layers