I Can't Share Code, but I need Translation -- An Empirical Study on Code Translation through Federated LLM

📅 2025-01-10

📈 Citations: 0

✨ Influential: 0

career value

167K/year

🤖 AI Summary

To address the challenge of balancing privacy preservation and model performance in enterprise-level cross-lingual code migration, this paper proposes FedLLM—the first federated large language model framework tailored for code translation. Built upon the federated learning paradigm, FedLLM enables multiple parties to collaboratively fine-tune CodeLlama without sharing raw source code, integrating cross-lingual code alignment and instruction tuning to achieve distributed training with zero sensitive code uploads. Experimental evaluation on bidirectional C#↔Java translation demonstrates that FedLLM improves CodeBLEU scores by over 40% compared to single-client baseline models. This work provides the first empirical validation that federated learning can simultaneously safeguard intellectual property and enhance translation quality in code generation tasks. Consequently, FedLLM establishes a novel paradigm for multilingual software evolution in privacy-sensitive industrial settings.

Technology Category

Application Category

📝 Abstract

Owing to the rapid evolution of technologies and project requirements, organizations need to upgrade the code base in their software projects to a new version of the programming language or even translating to an entirely new one. However, code translation is resource-intensive and requires expertise in both the source and target languages. While researchers have made progress in automating translations between legacy and modern languages, recent work has increasingly turned to pre-trained Large Language Models (LLMs) to translate efficiently. Given the proprietary nature of code, organizations prefer fine-tuning LLMs locally rather than relying on external APIs. This is one of the first empirical studies that proposes a Federated LLM-based approach for code translation. The proposed approach enables clients to jointly train a code translator without sharing sensitive data. This study demonstrates that participants can collaboratively develop a FedLLM for efficient code translation (particularly C# to Java and vice-versa) with superior results (more than 40% improvement in CodeLLaMA's CodeBLEU score) compared to individual client models. Our findings indicate that FedLLM offers a collaborative approach to code translation and could serve as a promising direction for future research in this field.

Problem

Research questions and friction points this paper is trying to address.

Privacy-Preserving

Code Translation

Language Model

Innovation

Methods, ideas, or system contributions that make the work stand out.

Federated Learning

Large Language Models

Code Translation

🔎 Similar Papers

Exploring the Impact of the Output Format on the Evaluation of Large Language Models for Code Translation