ML-ECS: A Collaborative Multimodal Learning Framework for Edge-Cloud Synergies

📅 2026-02-15

📈 Citations: 0

✨ Influential: 0

career value

234K/year

🤖 AI Summary

This work addresses the challenges of multimodal collaborative learning in edge–cloud协同 environments, where heterogeneity in data modalities and model architectures hinders effective joint training. To this end, the authors propose ML-ECS, a framework that enables efficient and privacy-preserving federated learning by integrating cross-modal contrastive learning (CCL), adaptive multimodal tuning (AMT), modality-aware model aggregation (MMA), and a small language model–enhanced CCL (SE-CCL). Leveraging a low-rank LoRA-based parameter communication mechanism, ML-ECS robustly shares knowledge even under missing modalities. Experimental results demonstrate that ML-ECS significantly outperforms existing methods across diverse multimodal tasks, achieving Rouge-LSum improvements of 5.44%–12.08% while incurring only 0.65% communication overhead relative to the total model parameters.

Technology Category

Application Category

📝 Abstract

Edge-cloud synergies provide a promising paradigm for privacy-preserving deployment of foundation models, where lightweight on-device models adapt to domain-specific data and cloud-hosted models coordinate knowledge sharing. However, in real-world edge environments, collaborative multimodal learning is challenged by modality heterogeneity (different modality combinations across domains) and model-structure heterogeneity (different modality-specific encoders/fusion modules. To address these issues, we propose ML-ECS, a collaborative multimodal learning framework that enables joint training between a server-based model and heterogeneous edge models. This framework consists of four components: (1) cross-modal contrastive learning (CCL) to align modality representations in a shared latent space, (2) adaptive multimodal tuning (AMT) to preserve domain-specific knowledge from local datasets, (3) modality-aware model aggregation (MMA) to robustly aggregate while mitigating noise caused by missing modalities, and (4) SLM-enhanced CCL (SE-CCL) to facilitate bidirectional knowledge transfer between cloud and edge. Experimental results on various multimodal tasks show that \pname consistently outperform state-of-the-art baselines under varying modality availability, achieving improvements of 5.44% to 12.08% in Rouge-LSum and improving both client- and server-side performance. In addition, by communicating only low-rank LoRA parameters and fused representations, ML-ECS achieves high communication efficiency, requiring only 0.65% of the total parameter volume.

Problem

Research questions and friction points this paper is trying to address.

modality heterogeneity

model-structure heterogeneity

edge-cloud synergy

collaborative multimodal learning

foundation models

Innovation

Methods, ideas, or system contributions that make the work stand out.

edge-cloud synergy

multimodal learning

modality heterogeneity