Large Language Models for Multilingual Code Intelligence: A Survey

📅 2026-04-27

📈 Citations: 0

✨ Influential: 0

career value

180K/year

🤖 AI Summary

Current large language models excel at code generation for high-resource languages such as Python but exhibit significantly degraded performance on low-resource languages like Rust and OCaml, falling short of the demands of real-world multilingual software systems. This work systematically surveys key tasks in multilingual code intelligence—namely, natural-language-instructed code generation across multiple programming languages and semantically consistent cross-lingual code translation—and reviews prevailing methodologies, benchmark datasets, and evaluation metrics. It uniquely emphasizes the cross-lingual generalization capabilities of large language models in multilingual code tasks, uncovering core challenges including inadequate support for low-resource languages and the difficulty of ensuring cross-lingual semantic consistency. The paper concludes by outlining promising directions for future research toward trustworthy multilingual code understanding and generation.

📝 Abstract

Large language models have transformed AI-assisted software engineering, but current research remains biased toward high-resource languages such as Python, with weaker performance in languages like Rust and OCaml. Since real-world systems are inherently polyglot, robust multilingual code intelligence is crucial. This survey focuses on two key tasks: multilingual code generation from shared natural-language requirements, and multilingual code translation that preserves semantics across languages. It reviews representative methods, benchmarks, and evaluation metrics, and highlights challenges and opportunities for trustworthy cross-language generalization.

Problem

Research questions and friction points this paper is trying to address.

multilingual code intelligence

large language models

code generation

code translation

cross-language generalization

Innovation

Methods, ideas, or system contributions that make the work stand out.

multilingual code intelligence

large language models

code generation