On-device Collaborative Language Modeling via a Mixture of Generalists and Specialists

📅 2024-09-20

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

177K/year

🤖 AI Summary

Federated training of large language models (LLMs) on edge devices faces dual challenges of heterogeneous computational resources and non-IID data distributions. Method: We propose a resource-adaptive Mixture-of-Experts (MoE) framework featuring a two-level optimization objective, where a held-out validation set guides routing to align with the target distribution. We introduce a novel architecture combining globally shared generalist experts with device-specific specialist experts dynamically allocated according to local compute capacity, optimized via an alternating minimization algorithm. Contribution/Results: This is the first work to enable scalable, privacy-preserving MoE training in federated settings. It significantly improves personalized text generation quality across heterogeneous devices: generalist experts mitigate local overfitting, while specialist experts ensure device-specific adaptation—yielding token-level outputs that balance generalization and personalization. The implementation is publicly available.

Technology Category

Application Category

📝 Abstract

On-device LLMs have gained increasing attention for their ability to enhance privacy and provide a personalized user experience. To facilitate private learning with scarce data, Federated Learning has become a standard approach. However, it faces challenges such as computational resource heterogeneity and data heterogeneity among end users. We propose CoMiGS ($ extbf{Co}$llaborative learning with a $ extbf{Mi}$xture of $ extbf{G}$eneralists and $ extbf{S}$pecialists), the first approach to address both challenges. A key innovation of our method is the bi-level optimization formulation of the Mixture-of-Experts learning objective, where the router is optimized using a separate validation set to ensure alignment with the target distribution. We solve our objective with alternating minimization, for which we provide a theoretical analysis. Our method shares generalist experts across users while localizing a varying number of specialist experts, thereby adapting to users' computational resources and preserving privacy. Through extensive experiments, we show CoMiGS effectively balances general and personalized knowledge for each token generation. We demonstrate that CoMiGS remains robust against overfitting-due to the generalists' regularizing effect-while adapting to local data through specialist expertise. We open source our codebase for collaborative LLMs.

Problem

Research questions and friction points this paper is trying to address.

Addressing computational resource heterogeneity

Handling data heterogeneity among users

Balancing general and personalized knowledge

Innovation

Methods, ideas, or system contributions that make the work stand out.

Mixture-of-Experts learning objective

Alternating minimization optimization

Generalist and specialist expert localization

🔎 Similar Papers

No similar papers found.