MoE-CAP: Benchmarking Cost, Accuracy and Performance of Sparse Mixture-of-Experts Systems

📅 2024-12-10

📈 Citations: 0

✨ Influential: 0

career value

229K/year

🤖 AI Summary

Sparse Mixture-of-Experts (MoE) models face a fundamental cost-accuracy-performance (CAP) trade-off on heterogeneous hardware, yet no systematic benchmark exists to jointly quantify these interdependent dimensions. Method: This paper proposes the first three-dimensional benchmarking framework for sparse MoE, built upon a sparse-aware CAP co-analysis model that integrates hardware-aware modeling, dynamic sparsity activation profiling, multi-dimensional metric normalization, and system-level simulation—enabling unified quantification and single-plot visualization of CAP coupling. Contribution/Results: It is the first to quantitatively characterize how sparsity impacts end-to-end system performance across mainstream MoE architectures. Experiments show a 37% reduction in CAP estimation error versus state-of-the-art baselines, delivering a reproducible, interpretable, and scalable quantitative foundation for model-hardware co-design.

Technology Category

Application Category

📝 Abstract

The Mixture-of-Experts (MoE) architecture is increasingly favored for scaling Large Language Models (LLMs). Its key feature, sparse activation, selectively activates only a subset of parameters (experts) per token, reducing memory bandwidth and compute FLOPs compared to dense models. To capitalize on this, MoE designers leverage heterogeneous compute and memory hardware to lower system costs. However, the interaction between model sparsity and hardware heterogeneity introduces trade-offs in Cost, Accuracy, and Performance (CAP). To address this, we introduce MoE-CAP, a benchmarking method for evaluating sparse MoE systems across these three dimensions. Its key innovation is a sparsity-aware CAP analysis model, the first to integrate cost, performance, and accuracy metrics into a single diagram while estimating the impact of sparsity on system performance. MoE-CAP helps practitioners optimize hardware provisioning for an MoE model-or vice versa. MoE-CAP supports various MoE models and provides more accurate metrics than existing methods.

Problem

Research questions and friction points this paper is trying to address.

Evaluates sparse MoE systems in CAP

Integrates cost, performance, accuracy metrics

Optimizes hardware provisioning for MoE

Innovation

Methods, ideas, or system contributions that make the work stand out.

Sparsity-aware CAP analysis model

Integrates cost, performance, accuracy metrics

Optimizes hardware for MoE models

🔎 Similar Papers

On Expert Estimation in Hierarchical Mixture of Experts: Beyond Softmax Gating Functions