BigMac: A Communication-Efficient Mixture-of-Experts Model Structure for Fast Training and Inference

📅 2025-02-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Fine-grained Mixture-of-Experts (MoE) models suffer from prohibitive All-to-All communication overhead, severely hindering training and inference efficiency. To address this, we propose BigMac—a novel MoE architecture designed for high communication efficiency. Its core innovation is the DCCA communication paradigm (Dimensionality reduction → Communication → Communication → Dimensionality augmentation), which replaces the conventional CDAC (Communication → Dimensionality reduction → Dimensionality augmentation → Communication) pattern. BigMac further integrates low-dimensional communication pathways and lightweight expert reparameterization, preserving modeling capacity while maintaining the same number of experts and total parameters. It is fully compatible with mainstream MoE systems—including Megatron, Tutel, and DeepSpeed-Inference. Experiments demonstrate that BigMac matches or exceeds the accuracy of comparable fine-grained MoE models, reduces end-to-end training latency by up to 3.09×, and improves inference throughput by up to 3.11×.

Technology Category

Application Category

📝 Abstract
The Mixture-of-Experts (MoE) structure scales the Transformer-based large language models (LLMs) and improves their performance with only the sub-linear increase in computation resources. Recently, a fine-grained DeepSeekMoE structure is proposed, which can further improve the computing efficiency of MoE without performance degradation. However, the All-to-All communication introduced by MoE has become a bottleneck, especially for the fine-grained structure, which typically involves and activates more experts, hence contributing to heavier communication overhead. In this paper, we propose a novel MoE structure named BigMac, which is also fine-grained but with high communication efficiency. The innovation of BigMac is mainly due to that we abandon the extbf{c}ommunicate- extbf{d}escend- extbf{a}scend- extbf{c}ommunicate (CDAC) manner used by fine-grained MoE, which leads to the All-to-All communication always taking place at the highest dimension. Instead, BigMac designs an efficient extbf{d}escend- extbf{c}ommunicate- extbf{c}ommunicate- extbf{a}scend (DCCA) manner. Specifically, we add a descending and ascending projection at the entrance and exit of the expert, respectively, which enables the communication to perform at a very low dimension. Furthermore, to adapt to DCCA, we re-design the structure of small experts, ensuring that the expert in BigMac has enough complexity to address tokens. Experimental results show that BigMac achieves comparable or even better model quality than fine-grained MoEs with the same number of experts and a similar number of total parameters. Equally importantly, BigMac reduces the end-to-end latency by up to 3.09$ imes$ for training and increases the throughput by up to 3.11$ imes$ for inference on state-of-the-art AI computing frameworks including Megatron, Tutel, and DeepSpeed-Inference.
Problem

Research questions and friction points this paper is trying to address.

Enhances communication efficiency in MoE models
Reduces end-to-end latency in model training
Improves throughput for model inference
Innovation

Methods, ideas, or system contributions that make the work stand out.

BigMac enhances MoE communication efficiency
DCCA replaces CDAC for lower dimension communication
Redesigned experts maintain complexity and performance
🔎 Similar Papers
No similar papers found.
Zewen Jin
Zewen Jin
University of Science and Technology of China
LLM Training / ServingMoEServerless Computing
S
Shengnan Wang
Huawei Technologies
Jiaan Zhu
Jiaan Zhu
中国科学技术大学
Computer ScienceLarge Language Model
H
Hongrui Zhan
University of Science and Technology of China
Y
Youhui Bai
Huawei Technologies
L
Lin Zhang
Huawei Technologies
Z
Zhenyu Ming
Huawei Technologies
C
Cheng Li
University of Science and Technology of China, Institute of Artificial Intelligence, Hefei Comprehensive National Science Center