MoPE: Mixture of Prompt Experts for Parameter-Efficient and Scalable Multimodal Fusion

📅 2024-03-14

📈 Citations: 2

✨ Influential: 0

career value

186K/year

🤖 AI Summary

Weak adaptability and limited expressiveness of prompt-based multimodal fusion methods lead to suboptimal performance. To address this, we propose the Dynamic Expert Prompt (DEP) framework: it decomposes static prompts into expert prompt modules that are dynamically routed per instance based on instance-specific features and modality-pair priors—achieving, for the first time, instance-level prompt decomposition and adaptive selection. We further introduce a routing regularization mechanism to encourage expert specialization, thereby enhancing interpretability and generalization. With only 0.8% trainable parameters, DEP achieves state-of-the-art performance on six cross-modal datasets spanning four modalities, matching full fine-tuning accuracy while improving parameter efficiency by over 120×.

Technology Category

Application Category

📝 Abstract

Despite the demonstrated parameter efficiency of prompt-based multimodal fusion methods, their limited adaptivity and expressiveness often result in suboptimal performance compared to other tuning approaches. In this paper, we introduce the Mixture of Prompt Experts (MoPE), the first technique designed to overcome these limitations by decomposing standard prompts to capture instance-level features adaptively. Building on this decomposition, MoPE enhances prompt fusion's expressiveness by leveraging multimodal pairing priors to route the most effective prompt for each instance dynamically. Compared to vanilla prompting, our MoPE-based fusion method exhibits greater expressiveness, scaling more effectively with the training data and the overall number of trainable parameters. We also investigate regularization terms for expert routing, which lead to emergent expert specialization with enhanced adaptiveness and interpretablity. Extensive experiments across six multimodal datasets spanning four modalities demonstrate state-of-the-art performance for prompt fusion, matching or even surpassing the performance of fine-tuning while requiring only 0.8% of the trainable parameters. Project homepage: https://github.com/songrise/MoPE

Problem

Research questions and friction points this paper is trying to address.

Prompt-based Multimodal Fusion

Parameter Efficiency

Adaptability and Expressiveness

Innovation

Methods, ideas, or system contributions that make the work stand out.

MoPE

Multimodal Fusion

Prompt-based Methods

🔎 Similar Papers

No similar papers found.