🤖 AI Summary
This paper addresses two practical challenges in federated multimodal fine-tuning: client-side resource heterogeneity (inconsistent LoRA ranks across clients) and partial modality availability. To this end, we propose FediLoRA—a novel framework integrating LoRA, federated learning, and inter-layer parameter fusion. Its core innovations are: (1) dimension-level reweighted aggregation, which mitigates information dilution inherent in naive parameter averaging; and (2) a lightweight layer-wise model editing and modality compensation mechanism, enabling local rank adaptation and recovery of missing modalities. Evaluated on three multimodal benchmarks, FediLoRA consistently outperforms state-of-the-art methods—particularly under modality-incomplete settings—achieving faster convergence and superior robustness while preserving both global model consistency and client-specific personalization.
📝 Abstract
Foundation models have demonstrated remarkable performance across a wide range of tasks, yet their large parameter sizes pose challenges for practical deployment, especially in decentralized environments. Parameter-efficient fine-tuning (PEFT), such as Low-Rank Adaptation (LoRA), reduces local computing and memory overhead, making it attractive for federated learning. However, existing federated LoRA methods typically assume uniform rank configurations and unimodal inputs, overlooking two key real-world challenges: (1) heterogeneous client resources have different LoRA ranks, and (2) multimodal data settings with potentially missing modalities. In this work, we propose FediLoRA, a simple yet effective framework for federated multimodal fine-tuning under heterogeneous LoRA ranks and missing modalities. FediLoRA introduces a dimension-wise aggregation strategy that reweights LoRA updates without information dilution during aggregation. It also includes a lightweight layer-wise model editing method that selectively incorporates global parameters to repair local components which improves both client and global model performances. Experimental results on three multimodal benchmark datasets demonstrate that FediLoRA achieves superior performance over competitive baselines in both global and personalized settings, particularly in the presence of modality incompleteness.