π€ AI Summary
To address the challenge of balancing privacy preservation and model performance in enterprise-level cross-lingual code migration, this paper proposes FedLLMβthe first federated large language model framework tailored for code translation. Built upon the federated learning paradigm, FedLLM enables multiple parties to collaboratively fine-tune CodeLlama without sharing raw source code, integrating cross-lingual code alignment and instruction tuning to achieve distributed training with zero sensitive code uploads. Experimental evaluation on bidirectional C#βJava translation demonstrates that FedLLM improves CodeBLEU scores by over 40% compared to single-client baseline models. This work provides the first empirical validation that federated learning can simultaneously safeguard intellectual property and enhance translation quality in code generation tasks. Consequently, FedLLM establishes a novel paradigm for multilingual software evolution in privacy-sensitive industrial settings.
π Abstract
Owing to the rapid evolution of technologies and project requirements, organizations need to upgrade the code base in their software projects to a new version of the programming language or even translating to an entirely new one. However, code translation is resource-intensive and requires expertise in both the source and target languages. While researchers have made progress in automating translations between legacy and modern languages, recent work has increasingly turned to pre-trained Large Language Models (LLMs) to translate efficiently. Given the proprietary nature of code, organizations prefer fine-tuning LLMs locally rather than relying on external APIs. This is one of the first empirical studies that proposes a Federated LLM-based approach for code translation. The proposed approach enables clients to jointly train a code translator without sharing sensitive data. This study demonstrates that participants can collaboratively develop a FedLLM for efficient code translation (particularly C# to Java and vice-versa) with superior results (more than 40% improvement in CodeLLaMA's CodeBLEU score) compared to individual client models. Our findings indicate that FedLLM offers a collaborative approach to code translation and could serve as a promising direction for future research in this field.