MultiPL-MoE: Multi-Programming-Lingual Extension of Large Language Models through Hybrid Mixture-of-Experts

📅 2025-08-22

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

To address the suboptimal performance of multilingual code generation under resource constraints, this paper proposes a dual-granularity Mixture-of-Experts (MoE) extension. At the token level, it introduces shared experts and gated weight normalization; at the code-segment level, it designs a sliding-window segmentation scheme coupled with a top-k active routing mechanism—jointly modeling syntactic structure and contextual patterns. The approach avoids full-parameter fine-tuning, significantly reducing computational overhead while preserving strong generative capability across mainstream programming languages. Experiments demonstrate consistent superiority over same-scale baseline models on multilingual code generation benchmarks, achieving a more favorable trade-off between performance gain and resource efficiency. This work establishes a scalable architectural paradigm for lightweight multilingual large language models for code.

Technology Category

Application Category

📝 Abstract

Despite LLMs' excellent code creation capabilities, multilingual code generation remains extremely challenging. To address this, we intent to improve the multi-programming-lingual (MultiPL) performance of the base LLMs while retaining the most popular ones using restricted computational resources. We consider MultiPL to be a special case of multiple natural languages and propose a MultiPL extension of LLMs utilizing a hybrid mixture of experts (MoE), called MultiPL-MoE. Specifically, MultiPL-MoE combines two paired MoEs to optimize expert selection at both the token and segment levels. The token-level MoE is a standard upcycling MoE structure with a shared expert and a novel gate weight normalization approach that aids in the final fusion with the segment-level MoE. The segment-level MoE incorporates two innovative designs to better capture the syntactic structure and contextual patterns of programming languages: First, using a sliding window to partition the input token sequence into multiple segments; Then, adopting an expert-choice routing strategy that allows experts to select the top-k segments. The results of the experiment proved the effectiveness of MultiPL-MoE.

Problem

Research questions and friction points this paper is trying to address.

Improving multilingual code generation in LLMs

Optimizing expert selection at token and segment levels

Enhancing programming language syntactic and contextual patterns

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid mixture-of-experts token and segment optimization

Sliding window segments with expert-choice routing strategy

Gate weight normalization for enhanced expert fusion

🔎 Similar Papers

No similar papers found.