FedMLLM: Federated Fine-tuning MLLM on Multimodal Heterogeneity Data

📅 2024-11-22
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of cross-modal data distribution heterogeneity in fine-tuning multimodal large language models (MLLMs) under federated learning (FL), this paper proposes FedMLLM—a general-purpose FL framework for MLLMs. Methodologically, it integrates classical FL paradigms with two modality-agnostic strategies: modality-agnostic feature alignment and lightweight heterogeneity modeling. Furthermore, it introduces the first comprehensive benchmark framework covering four major cross-modal scenarios and over ten types of modality heterogeneity. Extensive experiments on two lightweight MLLMs, five cross-domain datasets, and two downstream tasks demonstrate that FedMLLM significantly mitigates performance degradation induced by modality heterogeneity, enhancing model generalization and robustness. The framework provides a scalable, highly adaptive paradigm for privacy-sensitive multimodal federated fine-tuning.

Technology Category

Application Category

📝 Abstract
Multimodal Large Language Models (MLLMs) have made significant advancements, demonstrating powerful capabilities in processing and understanding multimodal data. Fine-tuning MLLMs with Federated Learning (FL) allows for expanding the training data scope by including private data sources, thereby enhancing their practical applicability in privacy-sensitive domains. However, current research remains in the early stage, particularly in addressing the extbf{multimodal heterogeneities} in real-world applications. In this paper, we introduce a benchmark to evaluate the performance of federated fine-tuning of MLLMs across various multimodal heterogeneous scenarios, laying the groundwork for future research in the field. Our benchmark includes two lightweight MLLMs, two downstream tasks, three evaluation metrics, and five datasets across three domains, along with six comparison baselines, covering over ten types of modality heterogeneities across four multimodal scenarios. To address the challenges posed by multimodal heterogeneity, we develop a general FedMLLM framework that integrates classic FL methods alongside two modality-agnostic strategies. Extensive experimental results show that our proposed FL paradigm improves the performance of MLLMs by broadening the range of training data and mitigating multimodal heterogeneity. Code is available in supplementary materials.
Problem

Research questions and friction points this paper is trying to address.

Address multimodal heterogeneity in federated learning.
Evaluate federated fine-tuning of MLLMs across diverse scenarios.
Develop a framework to enhance MLLM performance with FL.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Federated Learning for MLLM fine-tuning
Benchmark for multimodal heterogeneous scenarios
Modality-agnostic strategies in FedMLLM framework
🔎 Similar Papers
No similar papers found.
B
Binqian Xu
Nanjing University of Science and Technology
X
Xiangbo Shu
Nanjing University of Science and Technology
Haiyang Mei
Haiyang Mei
National University of Singapore, Dalian University of Technology, ETH Zurich
Computer VisionNeuroinformatics
G
Guosen Xie
Nanjing University of Science and Technology
Basura Fernando
Basura Fernando
Scientist at A*STAR Singapore, Assistant Professor at NTU
Visual ReasoningAction PredictionAction RecognitionTransfer LearningEmbodied AI
M
Mike Zheng Shou
Show Lab, National University of Singapore
J
Jinhui Tang
Nanjing University of Science and Technology