🤖 AI Summary
To address the challenge of balancing parameter efficiency and model capacity in large language model (LLM) fine-tuning, this paper proposes the Structured Residual Mixture-of-Experts (SREM) framework. SREM introduces two novel mechanisms: hierarchical low-rank residual decomposition and subtree-based routing, modeling residual updates as a graph neural network (GNN) structure. This design achieves exponential gains in structural flexibility over conventional Mixture-of-Experts (MoE) while retaining LoRA-level parameter counts. Theoretical analysis establishes that SREM’s expressivity upper bound strictly dominates existing methods. Empirical evaluation across multiple downstream tasks demonstrates consistent performance superiority over both LoRA and MoE baselines, with higher parameter efficiency—i.e., greater accuracy gain per added parameter—thereby unifying strong representational capacity with exceptional parameter economy.
📝 Abstract
Fine-tuning pre-trained large language models (LLMs) presents a dual challenge of balancing parameter efficiency and model capacity. Existing methods like low-rank adaptations (LoRA) are efficient but lack flexibility, while Mixture-of-Experts (MoE) architectures enhance model capacity at the cost of more&under-utilized parameters. To address these limitations, we propose Structural Mixture of Residual Experts (S'MoRE), a novel framework that seamlessly integrates the efficiency of LoRA with the flexibility of MoE. Specifically, S'MoRE employs hierarchical low-rank decomposition of expert weights, yielding residuals of varying orders interconnected in a multi-layer structure. By routing input tokens through sub-trees of residuals, S'MoRE emulates the capacity of many experts by instantiating and assembling just a few low-rank matrices. We craft the inter-layer propagation of S'MoRE's residuals as a special type of Graph Neural Network (GNN), and prove that under similar parameter budget, S'MoRE improves"structural flexibility"of traditional MoE (or Mixture-of-LoRA) by exponential order. Comprehensive theoretical analysis and empirical results demonstrate that S'MoRE achieves superior fine-tuning performance, offering a transformative approach for efficient LLM adaptation.