FPMoE: A Sparse Mixture-of-Experts Approach to Functional Code Generation

📅 2026-05-26

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

This work addresses the underperformance of large language models in generating code for functional programming languages such as Haskell, OCaml, and Scala, where balancing language-specific features with cross-lingual commonalities remains challenging. The authors propose the first application of a sparse mixture-of-experts (MoE) architecture to multi-functional-language code generation, introducing a lightweight model comprising three language-specific experts and one shared expert. A dedicated language-routing mechanism effectively disentangles language-specific details from universal patterns—such as monadic reasoning and type-directed programming—thereby mitigating interference across languages. Evaluated on the FPEval benchmark, the model achieves performance comparable to significantly larger models like DeepSeek-Coder-6.7B and Qwen2.5-Coder-14B while activating only 3B parameters.

📝 Abstract

Despite rapid progress in LLM-based code generation, existing models are predominantly trained on imperative languages, leaving functional programming languages (FPLs) such as Haskell, OCaml, and Scala chronically underexplored, with even frontier models performing substantially worse on FPLs. Fine-tuning is a natural remedy, but our experiments show that per-language fine-tuning fails to capture shared functional abstractions, while merged multi-language fine-tuning introduces cross-language interference. To address this, we introduce FPMoE, a lightweight, open-source code generation model built on a sparse Mixture-of-Experts (MoE) architecture with three language-specific routed experts (one each for Haskell, OCaml, and Scala) and a shared expert that captures cross-language functional patterns such as monadic reasoning and type-directed programming. This design resolves both failure modes simultaneously: dedicated experts eliminate interference, while the shared expert preserves abstractions that per-language models miss. On FPEval, FPMoE substantially outperforms fine-tuned baselines and, with only 3B active parameters, matches the performance of much larger models including DeepSeek-Coder-6.7B, Qwen2.5-Coder-14B-Instruct, and Qwen3-Coder-30B-A3B.

Problem

Research questions and friction points this paper is trying to address.

functional programming languages

code generation

cross-language interference

shared abstractions

fine-tuning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Mixture-of-Experts

functional programming

code generation