Robustness of Mixtures of Experts to Feature Noise

📅 2026-01-21

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work investigates why Mixture-of-Experts (MoE) models outperform dense networks under identical parameter budgets in the presence of input feature noise. Focusing on scenarios where inputs exhibit latent modular structure corrupted by noise, the study proposes that MoE implicitly filters noise through its sparse activation mechanism. Through theoretical analysis, synthetic data experiments, and real-world language tasks, the authors demonstrate that MoE achieves superior noise robustness, lower generalization error, and faster convergence by leveraging sparse, modular computation. The findings reveal that MoE not only maintains computational efficiency but also significantly enhances generalization performance, offering a novel perspective on the advantages of sparse architectures in noisy environments.

Technology Category

Application Category

📝 Abstract

Despite their practical success, it remains unclear why Mixture of Experts (MoE) models can outperform dense networks beyond sheer parameter scaling. We study an iso-parameter regime where inputs exhibit latent modular structure but are corrupted by feature noise, a proxy for noisy internal activations. We show that sparse expert activation acts as a noise filter: compared to a dense estimator, MoEs achieve lower generalization error under feature noise, improved robustness to perturbations, and faster convergence speed. Empirical results on synthetic data and real-world language tasks corroborate the theoretical insights, demonstrating consistent robustness and efficiency gains from sparse modular computation.

Problem

Research questions and friction points this paper is trying to address.

Mixture of Experts

feature noise

robustness

generalization error

sparse activation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Mixture of Experts

feature noise

sparse activation

robustness

generalization error

🔎 Similar Papers

No similar papers found.

Authors to Follow