MultiPL-MoE: Multi-Programming-Lingual Extension of Large Language Models through Hybrid Mixture-of-Experts

📅 2025-08-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the suboptimal performance of multilingual code generation under resource constraints, this paper proposes a dual-granularity Mixture-of-Experts (MoE) extension. At the token level, it introduces shared experts and gated weight normalization; at the code-segment level, it designs a sliding-window segmentation scheme coupled with a top-k active routing mechanism—jointly modeling syntactic structure and contextual patterns. The approach avoids full-parameter fine-tuning, significantly reducing computational overhead while preserving strong generative capability across mainstream programming languages. Experiments demonstrate consistent superiority over same-scale baseline models on multilingual code generation benchmarks, achieving a more favorable trade-off between performance gain and resource efficiency. This work establishes a scalable architectural paradigm for lightweight multilingual large language models for code.

Technology Category

Application Category

📝 Abstract
Despite LLMs' excellent code creation capabilities, multilingual code generation remains extremely challenging. To address this, we intent to improve the multi-programming-lingual (MultiPL) performance of the base LLMs while retaining the most popular ones using restricted computational resources. We consider MultiPL to be a special case of multiple natural languages and propose a MultiPL extension of LLMs utilizing a hybrid mixture of experts (MoE), called MultiPL-MoE. Specifically, MultiPL-MoE combines two paired MoEs to optimize expert selection at both the token and segment levels. The token-level MoE is a standard upcycling MoE structure with a shared expert and a novel gate weight normalization approach that aids in the final fusion with the segment-level MoE. The segment-level MoE incorporates two innovative designs to better capture the syntactic structure and contextual patterns of programming languages: First, using a sliding window to partition the input token sequence into multiple segments; Then, adopting an expert-choice routing strategy that allows experts to select the top-k segments. The results of the experiment proved the effectiveness of MultiPL-MoE.
Problem

Research questions and friction points this paper is trying to address.

Improving multilingual code generation in LLMs
Optimizing expert selection at token and segment levels
Enhancing programming language syntactic and contextual patterns
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid mixture-of-experts token and segment optimization
Sliding window segments with expert-choice routing strategy
Gate weight normalization for enhanced expert fusion
🔎 Similar Papers
No similar papers found.
Q
Qing Wang
JIUTIAN Team China Mobile, Beijing, China
Xue Han
Xue Han
Professor of Biomedical Engineering, Boston University
NeuroengineeringNeuroscience
J
Jiahui Wang
JIUTIAN Team China Mobile, Beijing, China
L
Lehao Xing
JIUTIAN Team China Mobile, Beijing, China
Q
Qian Hu
JIUTIAN Team China Mobile, Beijing, China
L
Lianlian Zhang
JIUTIAN Team China Mobile, Beijing, China
C
Chao Deng
JIUTIAN Team China Mobile, Beijing, China
Junlan Feng
Junlan Feng
Chief Scientist at China Mobile Research
Natural LanguageMachine LearningSpeech ProcessingData Mining