MdEval: Massively Multilingual Code Debugging

📅 2024-11-04
🏛️ arXiv.org
📈 Citations: 3
Influential: 0
📄 PDF
🤖 AI Summary
Existing programming benchmarks are heavily biased toward Python and lack comprehensive evaluation of multilingual debugging capabilities. Method: We introduce MDEVAL, the first large-scale multilingual code debugging benchmark comprising 3.6K samples across 18 programming languages, covering automatic program repair, code review, and defect identification. We propose xDebugGen, a cross-language defect injection framework, to construct MDEVAL-INSTRUCT—a dedicated instruction-tuning dataset—and train xDebugCoder, a specialized multilingual debugging model capable of modeling language-specific defects (e.g., Rust ownership violations, C memory errors). Our approach integrates syntax-aware defect modeling, multi-task evaluation, and synthetic-data-driven instruction fine-tuning. Contribution/Results: Experiments reveal that leading open-source models significantly underperform proprietary models (e.g., GPT, Claude) on multilingual debugging tasks. MDEVAL establishes a standardized evaluation platform and provides a strong baseline model, advancing research in multilingual intelligent code debugging.

Technology Category

Application Category

📝 Abstract
Code large language models (LLMs) have made significant progress in code debugging by directly generating the correct code based on the buggy code snippet. Programming benchmarks, typically consisting of buggy code snippet and their associated test cases, are used to assess the debugging capabilities of LLMs. However, many existing benchmarks primarily focus on Python and are often limited in terms of language diversity (e.g., DebugBench and DebugEval). To advance the field of multilingual debugging with LLMs, we propose the first massively multilingual debugging benchmark, which includes 3.6K test samples of 18 programming languages and covers the automated program repair (APR) task, the code review (CR) task, and the bug identification (BI) task. Further, we introduce the debugging instruction corpora MDEVAL-INSTRUCT by injecting bugs into the correct multilingual queries and solutions (xDebugGen). Further, a multilingual debugger xDebugCoder trained on MDEVAL-INSTRUCT as a strong baseline specifically to handle the bugs of a wide range of programming languages (e.g."Missing Mut"in language Rust and"Misused Macro Definition"in language C). Our extensive experiments on MDEVAL reveal a notable performance gap between open-source models and closed-source LLMs (e.g., GPT and Claude series), highlighting huge room for improvement in multilingual code debugging scenarios.
Problem

Research questions and friction points this paper is trying to address.

Develops multilingual code debugging benchmark
Introduces MDEVAL-INSTRUCT debugging corpora
Trains xDebugCoder for diverse programming languages
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multilingual debugging benchmark creation
Debugging instruction corpora development
Multilingual debugger training improvement
🔎 Similar Papers
No similar papers found.
Shukai Liu
Shukai Liu
Beihang university
Linzheng Chai
Linzheng Chai
Beihang University
NLP
J
Jian Yang
CCSE, Beihang University
J
Jiajun Shi
CCSE, Beihang University
H
He Zhu
CCSE, Beihang University
Liran Wang
Liran Wang
CCSE, Beihang University
Ke Jin
Ke Jin
Professor at Beijing Institute of Technology
Radiation damageIon Beam Analysishigh entropy alloysNuclear Material
W
Wei Zhang
CCSE, Beihang University
H
Hualei Zhu
CCSE, Beihang University
S
Shuyue Guo
CCSE, Beihang University
T
Tao Sun
CCSE, Beihang University
J
Jiaheng Liu
CCSE, Beihang University
Y
Yunlong Duan
CCSE, Beihang University
Y
Yu Hao
CCSE, Beihang University
L
Liqun Yang
CCSE, Beihang University
Guanglin Niu
Guanglin Niu
Assistant Professor, Beihang University
artificial intelligencenatural language processingknowledge graphdeep learningknowledge reasoning
G
Ge Zhang
CCSE, Beihang University
Zhoujun Li
Zhoujun Li
Beihang University
Artificial IntelligentNatural Language ProcessingNetwork Security