NeuronMoE: Neuron-Guided Mixture-of-Experts for Efficient Multilingual LLM Extension

📅 2026-03-05

📈 Citations: 0

✨ Influential: 0

career value

252K/year

🤖 AI Summary

This work addresses the high cost of extending large language models to low-resource languages by proposing a sparsity-aware expansion method grounded in neuron-level language specificity. By analyzing the distribution of language-specific neurons across Transformer layers, the study reveals, for the first time at the neuronal granularity, how cross-lingual representational differences manifest in multilingual models. Leveraging these insights, the authors dynamically allocate the number of experts per layer in a Mixture-of-Experts (MoE) architecture. Evaluated on Llama-3.2-3B with Greek, Turkish, and Hungarian, the approach reduces parameter count by approximately 40% on average while matching the performance of a LayerMoE baseline. Notably, low-resource languages spontaneously develop neuron specialization patterns in initial and final layers that resemble those of high-resource languages.

Technology Category

Application Category

📝 Abstract

Extending large language models to low-resource languages is essential for global accessibility, but training separate models per language is prohibitively expensive. Mixture-of-Experts (MoE) architectures address this by adding sparse language-specific parameters, but determining how many experts each layer needs remains an open question. Current approaches allocate experts based on layer-level similarity, yet language processing exhibits fine-grained specialization at individual neurons. We propose $\textbf{NeuronMoE}$, a method that analyzes language-specific neurons across all transformer components to guide expert allocation per layer based on empirically measured cross-lingual neuron diversity. Applied to Llama-3.2-3B for low-resource languages (Greek, Turkish, and Hungarian), this approach achieves approximately 40% average parameter reduction while matching the performance of the LayerMoE baseline. We find that low-resource language experts independently develop neuron specialization patterns mirroring the high-resource language, which are concentrated in early and late layers. This reveals potential universal architectural principles in how multilingual models organize linguistic knowledge.

Problem

Research questions and friction points this paper is trying to address.

Mixture-of-Experts

multilingual LLM

low-resource languages

expert allocation

neuron specialization

Innovation

Methods, ideas, or system contributions that make the work stand out.

NeuronMoE

Mixture-of-Experts

cross-lingual neuron diversity