Across Programming Language Silos: A Study on Cross-Lingual Retrieval-augmented Code Generation

📅 2025-06-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing research on Retrieval-Augmented Code Generation (RACG) is confined to monolingual settings, with the effectiveness and safety of cross-lingual transfer remaining unexplored systematically. Method: We introduce the first multilingual RACG benchmark covering 13 programming languages and 14,000 samples; propose a cross-lingual adversarial data construction method; develop domain-adapted code embedding and retrieval models; and establish a unified evaluation protocol. Contributions/Results: Key findings include: (1) Java significantly outperforms Python in cross-lingual RACG, revealing utility imbalance across languages; (2) certain adversarial perturbations paradoxically improve performance; and (3) domain-specific retrievers substantially surpass general-purpose text retrievers. Experiments demonstrate that multilingual RACG enhances generation quality, provide the first quantitative characterization of robustness disparities between monolingual and cross-lingual settings, and publicly release the benchmark dataset and analytical framework.

Technology Category

Application Category

📝 Abstract
Current research on large language models (LLMs) with retrieval-augmented code generation (RACG) mainly focuses on single-language settings, leaving cross-lingual effectiveness and security unexplored. Multi-lingual RACG systems are valuable for migrating code-bases across programming languages (PLs), yet face risks from error (e.g. adversarial data corruption) propagation in cross-lingual transfer. We construct a dataset spanning 13 PLs with nearly 14k instances to explore utility and robustness of multi-lingual RACG systems. Our investigation reveals four key insights: (1) Effectiveness: multi-lingual RACG significantly enhances multi-lingual code LLMs generation; (2) Inequality: Java demonstrate superior cross-lingual utility over Python in RACG; (3) Robustness: Adversarial attacks degrade performance significantly in mono-lingual RACG but show mitigated impacts in cross-lingual scenarios; Counterintuitively, perturbed code may improve RACG in cross-lingual scenarios; (4) Specialization: Domain-specific code retrievers outperform significantly general text retrievers. These findings establish foundation for developing effective and secure multi-lingual code assistants.
Problem

Research questions and friction points this paper is trying to address.

Explores cross-lingual effectiveness and security in retrieval-augmented code generation
Investigates utility and robustness of multi-lingual RACG systems across 13 PLs
Addresses risks from error propagation in cross-lingual code transfer
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-lingual RACG enhances code generation
Java outperforms Python in cross-lingual utility
Domain-specific retrievers beat general text retrievers
Qiming Zhu
Qiming Zhu
Chinese Information Processing Laboratory, Institute of Software, Chinese Academy of Sciences
Large Language ModelAI for SE
Jialun Cao
Jialun Cao
The Hong Kong University of Science and Technology
SE for AIAI for SE
Xuanang Chen
Xuanang Chen
Institute of Software, Chinese Academy of Sciences
Information RetrievalNatural Language Processing
Yaojie Lu
Yaojie Lu
Institute of Software, Chinese Academy of Sciences
Information ExtractionLarge Language Models
H
Hongyu Lin
Chinese Information Processing Laboratory, Institute of Software, Chinese Academy of Sciences, Beijing, China
X
Xianpei Han
Chinese Information Processing Laboratory, Institute of Software, Chinese Academy of Sciences, Beijing, China; University of Chinese Academy of Sciences, Beijing, China
Le Sun
Le Sun
Institute of Software, CAS
information_retrievalnatural_language_processing
S
S. Cheung
The Hong Kong University of Science and Technology, Hong Kong, China