Skeleton-Guided-Translation: A Benchmarking Framework for Code Repository Translation with Fine-Grained Quality Evaluation

📅 2025-01-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing codebase-level translation methods struggle to ensure inter-module consistency, dependency correctness, and fine-grained quality—limiting the applicability of large language models (LLMs) in enterprise-scale Java-to-C# migration. This paper proposes a skeleton-guided two-stage translation paradigm that enforces structural alignment while enabling LLM-driven precise translation. We introduce TRANSREPO-BENCH, the first repository-level benchmark supporting reproducible builds and unit testing for cross-language migration. Furthermore, we design test-case-level fine-grained evaluation metrics that overcome the limitations of conventional binary pass/fail assessments. Experimental results demonstrate substantial improvements in build success rate and test pass rate. Notably, our analysis is the first to systematically identify and quantify a critical bottleneck: mainstream LLMs consistently fail to preserve interface consistency across modules—a key challenge for scalable, production-grade migration.

Technology Category

Application Category

📝 Abstract
The advancement of large language models has intensified the need to modernize enterprise applications and migrate legacy systems to secure, versatile languages. However, existing code translation benchmarks primarily focus on individual functions, overlooking the complexities involved in translating entire repositories, such as maintaining inter-module coherence and managing dependencies. While some recent repository-level translation benchmarks attempt to address these challenges, they still face limitations, including poor maintainability and overly coarse evaluation granularity, which make them less developer-friendly. We introduce Skeleton-Guided-Translation, a framework for repository-level Java to C# code translation with fine-grained quality evaluation. It uses a two-step process: first translating the repository's structural"skeletons", then translating the full repository guided by these skeletons. Building on this, we present TRANSREPO-BENCH, a benchmark of high quality open-source Java repositories and their corresponding C# skeletons, including matching unit tests and build configurations. Our unit tests are fixed and can be applied across multiple or incremental translations without manual adjustments, enhancing automation and scalability in evaluations. Additionally, we develop fine-grained evaluation metrics that assess translation quality at the individual test case level, addressing traditional binary metrics' inability to distinguish when build failures cause all tests to fail. Evaluations using TRANSREPO-BENCH highlight key challenges and advance more accurate repository level code translation.
Problem

Research questions and friction points this paper is trying to address.

Code Translation
Large Codebase
Translation Quality Assessment
Innovation

Methods, ideas, or system contributions that make the work stand out.

Skeleton-Guided-Translation
TRANSREPO-BENCH
Fine-Grained Code Translation
🔎 Similar Papers
No similar papers found.
X
Xing Zhang
Peking University
J
Jiaheng Wen
Zhejiang University
F
Fangkai Yang
Microsoft
P
Pu Zhao
Microsoft
Y
Yu Kang
Microsoft
J
Junhao Wang
Tongji University
M
Maoquan Wang
Microsoft
Y
Yufan Huang
Microsoft
E
Elsie Nallipogu
Microsoft
Qingwei Lin
Qingwei Lin
Microsoft
Yingnong Dang
Yingnong Dang
Microsoft
Cloud servicedata analyticssoftware analyticsmachine learninghuman-computer interaction
S
S. Rajmohan
Microsoft
Dongmei Zhang
Dongmei Zhang
Microsoft Research
Software EngineeringMachine LearningInformation Visualization
Q
Qi Zhang
Microsoft