Parameter-Efficient Fine-Tuning of State Space Models

📅 2024-10-11

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

235K/year

🤖 AI Summary

Existing parameter-efficient fine-tuning (PEFT) methods—particularly LoRA—exhibit poor adaptability to state space models (SSMs), such as Mamba, due to structural mismatches with SSM’s core components. Method: This work is the first to systematically characterize the structural sensitivity of SSMs to mainstream PEFT techniques and proposes Sparse Dimensional Tuning (SDT), the first SSM-aware PEFT method. SDT applies sparse low-rank updates directly to key SSM parameters (e.g., A, B, C matrices), enabling fine-grained, structure-aware optimization. It further integrates with LoRA into a unified SDT+LoRA framework. Contribution/Results: Extensive experiments on language modeling tasks demonstrate that SDT achieves state-of-the-art performance with <0.5% parameter overhead—significantly outperforming LoRA and other baselines—while attaining superior trade-offs between accuracy and efficiency.

Technology Category

Application Category

📝 Abstract

Deep State Space Models (SSMs), such as Mamba (Gu&Dao, 2024), have become powerful tools for language modeling, offering high performance and linear scalability with sequence length. However, the application of parameter-efficient fine-tuning (PEFT) methods to SSM-based models remains underexplored. We start by investigating two fundamental questions on existing PEFT methods: (i) How do they perform on SSM-based models? (ii) Which parameters should they target for optimal results? Our analysis shows that LoRA and its variants consistently outperform all other PEFT methods. While LoRA is effective for linear projection matrices, it fails on SSM modules-yet still outperforms other methods applicable to SSMs, indicating their limitations. This underscores the need for a specialized SSM tuning approach. To address this, we propose Sparse Dimension Tuning (SDT), a PEFT method tailored for SSM modules. Combining SDT for SSMs with LoRA for linear projection matrices, we achieve state-of-the-art performance across extensive experiments.

Problem

Research questions and friction points this paper is trying to address.

Explores parameter-efficient fine-tuning for SSM-based models.

Identifies limitations of existing PEFT methods on SSM modules.

Proposes Sparse Dimension Tuning (SDT) for improved SSM performance.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Proposes Sparse Dimension Tuning for SSMs

Combines SDT with LoRA for linear projections

Achieves state-of-the-art performance in experiments

🔎 Similar Papers

No similar papers found.