π€ AI Summary
Existing parameter-efficient fine-tuning (PEFT) methods for Mamba architectures over-rely on adapting the State Space Model (SSM) module, neglecting the critical role of the projector in transfer learning. Method: This work first identifies the projector as the dominant component for cross-task adaptation and proposes ProDiaLβa novel PEFT method that introduces only a learnable diagonal linear transformation applied centrally to the pretrained projector, enabling targeted, non-weight-updating adaptation while freezing all SSM parameters. Contribution/Results: ProDiaL decouples projector optimization from SSM learning, reducing trainable parameters by >99% (<1% of total). On both vision and language Mamba models, it achieves performance on par with full fine-tuning at minimal computational cost, demonstrating strong generalization. As the first projector-centric PEFT paradigm for Mamba, ProDiaL challenges the prevailing SSM-centric design philosophy and establishes a new direction for efficient Mamba adaptation.
π Abstract
Despite the growing interest in Mamba architecture as a potential replacement for Transformer architecture, parameter-efficient fine-tuning (PEFT) approaches for Mamba remain largely unexplored. In our study, we introduce two key insights-driven strategies for PEFT in Mamba architecture: (1) While state-space models (SSMs) have been regarded as the cornerstone of Mamba architecture, then expected to play a primary role in transfer learning, our findings reveal that Projectors -- not SSMs -- are the predominant contributors to transfer learning. (2) Based on our observation, we propose a novel PEFT method specialized to Mamba architecture: Projector-targeted Diagonal-centric Linear Transformation (ProDiaL). ProDiaL focuses on optimizing only the pretrained Projectors for new tasks through diagonal-centric linear transformation matrices, without directly fine-tuning the Projector weights. This targeted approach allows efficient task adaptation, utilizing less than 1% of the total parameters, and exhibits strong performance across both vision and language Mamba models, highlighting its versatility and effectiveness.