🤖 AI Summary
To address the dual heterogeneity—diverse model architectures and downstream tasks—in Hybrid Heterogeneous Federated Fine-Tuning (HHFFT), which causes dimension mismatch and multi-task knowledge interference, this paper proposes the first systematic solution: (1) a sparsified triple matrix decomposition to achieve aggregable low-rank representations of heterogeneous LoRA parameters; (2) a relation-guided layer alignment mechanism to mitigate architectural discrepancies across clients; and (3) an alternating task knowledge disentanglement framework to separate shared and task-specific knowledge. Theoretically, we prove the algorithm converges at rate O(1/√T). Empirically, our method achieves up to 15.4% higher accuracy than state-of-the-art approaches across multiple benchmarks, significantly improving cross-device knowledge sharing efficiency and personalization performance.
📝 Abstract
Different from existing federated fine-tuning (FFT) methods for foundation models, hybrid heterogeneous federated fine-tuning (HHFFT) is an under-explored scenario where clients exhibit double heterogeneity in model architectures and downstream tasks. This hybrid heterogeneity introduces two significant challenges: 1) heterogeneous matrix aggregation, where clients adopt different large-scale foundation models based on their task requirements and resource limitations, leading to dimensional mismatches during LoRA parameter aggregation; and 2) multi-task knowledge interference, where local shared parameters, trained with both task-shared and task-specific knowledge, cannot ensure only task-shared knowledge is transferred between clients. To address these challenges, we propose H2Tune, a federated foundation model fine-tuning with hybrid heterogeneity. Our framework H2Tune consists of three key components: (i) sparsified triple matrix decomposition to align hidden dimensions across clients through constructing rank-consistent middle matrices, with adaptive sparsification based on client resources; (ii) relation-guided matrix layer alignment to handle heterogeneous layer structures and representation capabilities; and (iii) alternating task-knowledge disentanglement mechanism to decouple shared and specific knowledge of local model parameters through alternating optimization. Theoretical analysis proves a convergence rate of O(1/sqrt{T}). Extensive experiments show our method achieves up to 15.4% accuracy improvement compared to state-of-the-art baselines. Our code is available at https://anonymous.4open.science/r/H2Tune-1407.