Mixture of Experts in Image Classification: What's the Sweet Spot?

📅 2024-11-27
🏛️ arXiv.org
📈 Citations: 4
Influential: 0
📄 PDF
🤖 AI Summary
Existing Mixture-of-Experts (MoE) models exhibit limited applicability in image classification, often requiring billion-scale datasets to achieve competitive performance. Method: This work systematically investigates parameter-efficient scaling of MoE architectures on open-vision benchmarks (e.g., ImageNet), proposing a dynamic sparse-gated MoE framework built upon ViT or CNN backbones, supporting sample-wise routing and multi-granularity expert configurations. Contribution/Results: We identify a “sweet spot” for MoE in image classification—optimal accuracy occurs at moderate activated parameter counts; further expansion leads to diminishing returns or even degradation, revealing a non-monotonic relationship between activated parameters and accuracy. Through extensive experiments, we establish reproducible configuration guidelines and empirical upper-bound performance benchmarks. To our knowledge, this is the first systematic, empirically grounded analysis framework for vision-oriented MoE models.

Technology Category

Application Category

📝 Abstract
Mixture-of-Experts (MoE) models have shown promising potential for parameter-efficient scaling across various domains. However, the implementation in computer vision remains limited, and often requires large-scale datasets comprising billions of samples. In this study, we investigate the integration of MoE within computer vision models and explore various MoE configurations on open datasets. When introducing MoE layers in image classification, the best results are obtained for models with a moderate number of activated parameters per sample. However, such improvements gradually vanish when the number of parameters per sample increases.
Problem

Research questions and friction points this paper is trying to address.

Optimizing MoE integration for efficient image classification scaling
Determining ideal parameter activation trade-offs in vision models
Identifying effective MoE configurations across different model architectures
Innovation

Methods, ideas, or system contributions that make the work stand out.

MoE layers integrated into image classification architectures
Moderate parameter activation optimizes performance efficiency trade-off
Last-2 placement heuristic provides robust cross-architecture choice
🔎 Similar Papers
No similar papers found.