A Survey on Mixture of Experts in Large Language Models

📅 2024-06-26

🏛️ IEEE Transactions on Knowledge and Data Engineering

📈 Citations: 77

✨ Influential: 6

career value

215K/year

🤖 AI Summary

A systematic, comprehensive survey of Mixture-of-Experts (MoE) methods tailored for large language models (LLMs) is currently lacking. Method: This work bridges the gap by introducing the first unified three-dimensional taxonomy—spanning algorithmic design, systems implementation, and practical deployment—for LLM-oriented MoE. It proposes a structured MoE taxonomy specifically for LLMs, integrating technical principles, open-source tooling, hyperparameter configurations, and empirical evaluation protocols. Concurrently, we establish an actively maintained, open GitHub repository hosting mainstream MoE models, standardized training configurations, and benchmarking results. Contribution/Results: This paper delivers the first holistic, reproducible, and continuously evolving MoE survey for LLMs. It serves as an authoritative reference and practical guide for researchers and practitioners, enabling rigorous comparison, informed design choices, and accelerated development of scalable, efficient MoE-based LLMs.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) have garnered unprecedented advancements across diverse fields, ranging from natural language processing to computer vision and beyond. The prowess of LLMs is underpinned by their substantial model size, extensive and diverse datasets, and the vast computational power harnessed during training, all of which contribute to the emergent abilities of LLMs (e.g., in-context learning) that are not present in small models. Within this context, the mixture of experts (MoE) has emerged as an effective method for substantially scaling up model capacity with minimal computation overhead, gaining significant attention from academia and industry. Despite its growing prevalence, there lacks a systematic and comprehensive review of the literature on MoE. This survey seeks to bridge that gap, serving as an essential resource for researchers delving into the intricacies of MoE. We first briefly introduce the structure of the MoE layer, followed by proposing a new taxonomy of MoE. Next, we overview the core designs for various MoE models including both algorithmic and systemic aspects, alongside collections of available open-source implementations, hyperparameter configurations and empirical evaluations. Furthermore, we delineate the multifaceted applications of MoE in practice, and outline some potential directions for future research. To facilitate ongoing updates and the sharing of cutting-edge advances in MoE research, we have established a resource repository at https://github.com/withinmiaov/A-Survey-on-Mixture-of-Experts-in-LLMs.

Problem

Research questions and friction points this paper is trying to address.

Lack of systematic review on Mixture of Experts (MoE) in LLMs

Need for taxonomy and core designs of MoE models

Exploring applications and future directions for MoE research

Innovation

Methods, ideas, or system contributions that make the work stand out.

MoE scales model capacity efficiently

New taxonomy proposed for MoE

Resource repository for MoE updates

🔎 Similar Papers

Diversifying the Expert Knowledge for Task-Agnostic Pruning in Sparse Mixture-of-Experts