NeuronMoE: Neuron-Guided Mixture-of-Experts for Efficient Multilingual LLM Extension

📅 2026-03-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the high cost of extending large language models to low-resource languages by proposing a sparsity-aware expansion method grounded in neuron-level language specificity. By analyzing the distribution of language-specific neurons across Transformer layers, the study reveals, for the first time at the neuronal granularity, how cross-lingual representational differences manifest in multilingual models. Leveraging these insights, the authors dynamically allocate the number of experts per layer in a Mixture-of-Experts (MoE) architecture. Evaluated on Llama-3.2-3B with Greek, Turkish, and Hungarian, the approach reduces parameter count by approximately 40% on average while matching the performance of a LayerMoE baseline. Notably, low-resource languages spontaneously develop neuron specialization patterns in initial and final layers that resemble those of high-resource languages.

Technology Category

Application Category

📝 Abstract
Extending large language models to low-resource languages is essential for global accessibility, but training separate models per language is prohibitively expensive. Mixture-of-Experts (MoE) architectures address this by adding sparse language-specific parameters, but determining how many experts each layer needs remains an open question. Current approaches allocate experts based on layer-level similarity, yet language processing exhibits fine-grained specialization at individual neurons. We propose $\textbf{NeuronMoE}$, a method that analyzes language-specific neurons across all transformer components to guide expert allocation per layer based on empirically measured cross-lingual neuron diversity. Applied to Llama-3.2-3B for low-resource languages (Greek, Turkish, and Hungarian), this approach achieves approximately 40% average parameter reduction while matching the performance of the LayerMoE baseline. We find that low-resource language experts independently develop neuron specialization patterns mirroring the high-resource language, which are concentrated in early and late layers. This reveals potential universal architectural principles in how multilingual models organize linguistic knowledge.
Problem

Research questions and friction points this paper is trying to address.

Mixture-of-Experts
multilingual LLM
low-resource languages
expert allocation
neuron specialization
Innovation

Methods, ideas, or system contributions that make the work stand out.

NeuronMoE
Mixture-of-Experts
cross-lingual neuron diversity
multilingual LLM
parameter-efficient adaptation
🔎 Similar Papers
No similar papers found.