TACTFL: Temporal Contrastive Training for Multi-modal Federated Learning with Similarity-guided Model Aggregation

📅 2025-09-22

📈 Citations: 0

✨ Influential: 0

career value

181K/year

🤖 AI Summary

Addressing the dual challenges of label scarcity and multimodal data heterogeneity in federated learning, this paper proposes the first semi-supervised federated learning framework tailored for multimodal time-series data. Methodologically, it innovatively integrates modality-agnostic temporal contrastive learning with cross-modal representation alignment and introduces a similarity-guided dynamic aggregation strategy based on representation consistency to mitigate client-level semantic drift. Technically, the framework unifies self-supervised pretraining, federated averaging optimization, and modality-adaptive weight aggregation, enabling joint modeling of video, audio, and wearable sensor data. Extensive experiments on benchmarks such as UCF101 demonstrate significant improvements over state-of-the-art methods: using only 10% labeled data, the framework achieves 68.48% top-1 accuracy—outperforming the FedOpt baseline by 33.13 percentage points.

Technology Category

Application Category

📝 Abstract

Real-world federated learning faces two key challenges: limited access to labelled data and the presence of heterogeneous multi-modal inputs. This paper proposes TACTFL, a unified framework for semi-supervised multi-modal federated learning. TACTFL introduces a modality-agnostic temporal contrastive training scheme that conducts representation learning from unlabelled client data by leveraging temporal alignment across modalities. However, as clients perform self-supervised training on heterogeneous data, local models may diverge semantically. To mitigate this, TACTFL incorporates a similarity-guided model aggregation strategy that dynamically weights client models based on their representational consistency, promoting global alignment. Extensive experiments across diverse benchmarks and modalities, including video, audio, and wearable sensors, demonstrate that TACTFL achieves state-of-the-art performance. For instance, on the UCF101 dataset with only 10% labelled data, TACTFL attains 68.48% top-1 accuracy, significantly outperforming the FedOpt baseline of 35.35%. Code will be released upon publication.

Problem

Research questions and friction points this paper is trying to address.

Addressing limited labeled data in multi-modal federated learning systems

Managing semantic divergence from heterogeneous client data training

Improving global model alignment across diverse modalities and clients

Innovation

Methods, ideas, or system contributions that make the work stand out.

Modality-agnostic temporal contrastive training for representation learning

Similarity-guided model aggregation for dynamic client weighting

Unified semi-supervised framework for multi-modal federated learning

🔎 Similar Papers

FedMAC: Tackling Partial-Modality Missing in Federated Learning with Cross-Modal Aggregation and Contrastive Regularization

2024-10-04IEEE International Symposium on Network Computing and ApplicationsCitations: 3

💼 Related Jobs

Data Scientist

Schlumberger / SLB

Houston, United States

Research Scientist Intern, Multimodal AI (PhD)