MASA: Rethinking the Representational Bottleneck in LoRA with Multi-A Shared Adaptation

📅 2025-10-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
LoRA suffers from a representational bottleneck due to its use of only a single low-rank decomposition matrix A, limiting its capacity to model diverse signals in complex tasks. To address this, we propose MASA—a Multi-A Single-B architecture—featuring heterogeneous, cross-layer shared expert A matrices for rich low-dimensional feature capture, and a unified B matrix for high-dimensional integration. Crucially, MASA retains the same trainable parameter count as standard LoRA (0.52% of the base model). Furthermore, we introduce a joint training strategy integrating multi-domain generalization, single-domain specialization, and multi-task inference. On the MMLU benchmark, MASA achieves a mean accuracy of 59.62%, outperforming standard LoRA by 1.08 percentage points (a relative improvement of +1.84%). This demonstrates significantly enhanced adaptability for parameter-efficient fine-tuning of large language models.

Technology Category

Application Category

📝 Abstract
Low-Rank Adaptation (LoRA) has emerged as a dominant method in Parameter-Efficient Fine-Tuning (PEFT) for large language models, which augments the transformer layer with one down-projection $A$ and one up-projection $B$. However, LoRA's reliance on a single down-projection matrix ($A$) creates a representational bottleneck, as this solitary feature extractor is inherently insufficient for capturing the diverse signals required by complex tasks. This motivates our architectural shift to focus on enriching the feature adaptation to improve the downstream task adaptation ability. We propose MASA (Multi-$A$ Shared Adaptation), an architecture that implements a multi-$A$, single-$B$ structure where the multi-$A$ expert ensemble is asymmetrically shared across layers to ensure parameter efficiency. In MASA, these specialized experts capture diverse features, which are then integrated by a single, layer-specific $B$-matrix. The effectiveness and versatility of our method are validated through a comprehensive suite of experiments spanning multi-domain generalization, single-domain specialization, and multi-task reasoning. For example, on the MMLU benchmark, MASA achieves an average accuracy of 59.62%, outperforming the standard LoRA by 1.08 points (a relative improvement of 1.84%) with comparable learnable parameters of 0.52%.
Problem

Research questions and friction points this paper is trying to address.

Addressing LoRA's representational bottleneck with multi-A architecture
Enhancing feature adaptation for improved downstream task performance
Maintaining parameter efficiency through asymmetric expert sharing
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-A expert ensemble captures diverse features
Asymmetric sharing ensures parameter efficiency across layers
Single B-matrix integrates features for downstream tasks
🔎 Similar Papers
No similar papers found.
Q
Qin Dong
East China Normal University
Y
Yuntian Tang
East China Normal University
Heming Jia
Heming Jia
Sanming University, Professor
Meta-heuristic Optimization Algorithm
Y
Yunhang Shen
Xiamen University
Bohan Jia
Bohan Jia
East China Normal University
MLLMLLMAIGC
Wenxuan Huang
Wenxuan Huang
CUHK & ECNU
Artificial General IntelligenceMLLMLLMAIGCModel Acceleration
L
Lianyue Zhang
East China Normal University
J
Jiao Xie
East China Normal University
S
Shaohui Lin
East China Normal University