FedMLLM: Federated Fine-tuning MLLM on Multimodal Heterogeneity Data

📅 2024-11-22

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

220K/year

🤖 AI Summary

To address the challenge of cross-modal data distribution heterogeneity in fine-tuning multimodal large language models (MLLMs) under federated learning (FL), this paper proposes FedMLLM—a general-purpose FL framework for MLLMs. Methodologically, it integrates classical FL paradigms with two modality-agnostic strategies: modality-agnostic feature alignment and lightweight heterogeneity modeling. Furthermore, it introduces the first comprehensive benchmark framework covering four major cross-modal scenarios and over ten types of modality heterogeneity. Extensive experiments on two lightweight MLLMs, five cross-domain datasets, and two downstream tasks demonstrate that FedMLLM significantly mitigates performance degradation induced by modality heterogeneity, enhancing model generalization and robustness. The framework provides a scalable, highly adaptive paradigm for privacy-sensitive multimodal federated fine-tuning.

Technology Category

Application Category

📝 Abstract

Multimodal Large Language Models (MLLMs) have made significant advancements, demonstrating powerful capabilities in processing and understanding multimodal data. Fine-tuning MLLMs with Federated Learning (FL) allows for expanding the training data scope by including private data sources, thereby enhancing their practical applicability in privacy-sensitive domains. However, current research remains in the early stage, particularly in addressing the extbf{multimodal heterogeneities} in real-world applications. In this paper, we introduce a benchmark to evaluate the performance of federated fine-tuning of MLLMs across various multimodal heterogeneous scenarios, laying the groundwork for future research in the field. Our benchmark includes two lightweight MLLMs, two downstream tasks, three evaluation metrics, and five datasets across three domains, along with six comparison baselines, covering over ten types of modality heterogeneities across four multimodal scenarios. To address the challenges posed by multimodal heterogeneity, we develop a general FedMLLM framework that integrates classic FL methods alongside two modality-agnostic strategies. Extensive experimental results show that our proposed FL paradigm improves the performance of MLLMs by broadening the range of training data and mitigating multimodal heterogeneity. Code is available in supplementary materials.

Problem

Research questions and friction points this paper is trying to address.

Address multimodal heterogeneity in federated learning.

Evaluate federated fine-tuning of MLLMs across diverse scenarios.

Develop a framework to enhance MLLM performance with FL.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Federated Learning for MLLM fine-tuning

Benchmark for multimodal heterogeneous scenarios

Modality-agnostic strategies in FedMLLM framework

🔎 Similar Papers

Federated Large Language Models: Current Progress and Future Directions

2024-09-24arXiv.orgCitations: 16