FedJETs: Efficient Just-In-Time Personalization with Federated Mixture of Experts

📅 2023-06-14

📈 Citations: 6

✨ Influential: 1

career value

218K/year

🤖 AI Summary

To address the challenges of cold-start adaptation, privacy sensitivity, and non-independent-and-identically-distributed (Non-IID) data in federated learning (FL), this paper proposes FedMoE—the first federated Mixture-of-Experts framework. Unlike conventional approaches relying on explicit task partitioning, FedMoE jointly optimizes a dynamic gating mechanism and distributed expert modules to enable adaptive, cross-client routing of domain knowledge: client diversity is leveraged to train class-specialized experts, while pre-trained general-purpose experts enhance zero-shot transferability. The method achieves plug-and-play personalization without requiring per-client fine-tuning. Evaluated on standard FL benchmarks, FedMoE improves accuracy by up to 18% over state-of-the-art baselines. It further demonstrates strong zero-shot generalization, communication efficiency (linear in client count), and linear scalability with respect to the number of clients.

📝 Abstract

One of the goals in Federated Learning (FL) is to create personalized models that can adapt to the context of each participating client, while utilizing knowledge from a shared global model. Yet, often, personalization requires a fine-tuning step using clients' labeled data in order to achieve good performance. This may not be feasible in scenarios where incoming clients are fresh and/or have privacy concerns. It, then, remains open how one can achieve just-in-time personalization in these scenarios. We propose FedJETs, a novel solution by using a Mixture-of-Experts (MoE) framework within a FL setup. Our method leverages the diversity of the clients to train specialized experts on different subsets of classes, and a gating function to route the input to the most relevant expert(s). Our gating function harnesses the knowledge of a pretrained model common expert to enhance its routing decisions on-the-fly. As a highlight, our approach can improve accuracy up to 18% in state of the art FL settings, while maintaining competitive zero-shot performance. In practice, our method can handle non-homogeneous data distributions, scale more efficiently, and improve the state-of-the-art performance on common FL benchmarks.

Problem

Research questions and friction points this paper is trying to address.

Dynamic expertise allocation in decentralized MoEs

Joint training of gating and experts for specialization

Personalized expert selection in federated learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Joint training of gating and experts

Dynamic decentralized MoE orchestration

Pretrained common expert informs gating

🔎 Similar Papers

No similar papers found.

Google

$207,000-$300,000 + bonus + equity + benefits.

Mountain View, CA, USA

Research Engineer, Language - Personalization, Meta Superintelligence Labs