🤖 AI Summary
Existing codebase-level translation methods struggle to ensure inter-module consistency, dependency correctness, and fine-grained quality—limiting the applicability of large language models (LLMs) in enterprise-scale Java-to-C# migration. This paper proposes a skeleton-guided two-stage translation paradigm that enforces structural alignment while enabling LLM-driven precise translation. We introduce TRANSREPO-BENCH, the first repository-level benchmark supporting reproducible builds and unit testing for cross-language migration. Furthermore, we design test-case-level fine-grained evaluation metrics that overcome the limitations of conventional binary pass/fail assessments. Experimental results demonstrate substantial improvements in build success rate and test pass rate. Notably, our analysis is the first to systematically identify and quantify a critical bottleneck: mainstream LLMs consistently fail to preserve interface consistency across modules—a key challenge for scalable, production-grade migration.
📝 Abstract
The advancement of large language models has intensified the need to modernize enterprise applications and migrate legacy systems to secure, versatile languages. However, existing code translation benchmarks primarily focus on individual functions, overlooking the complexities involved in translating entire repositories, such as maintaining inter-module coherence and managing dependencies. While some recent repository-level translation benchmarks attempt to address these challenges, they still face limitations, including poor maintainability and overly coarse evaluation granularity, which make them less developer-friendly. We introduce Skeleton-Guided-Translation, a framework for repository-level Java to C# code translation with fine-grained quality evaluation. It uses a two-step process: first translating the repository's structural"skeletons", then translating the full repository guided by these skeletons. Building on this, we present TRANSREPO-BENCH, a benchmark of high quality open-source Java repositories and their corresponding C# skeletons, including matching unit tests and build configurations. Our unit tests are fixed and can be applied across multiple or incremental translations without manual adjustments, enhancing automation and scalability in evaluations. Additionally, we develop fine-grained evaluation metrics that assess translation quality at the individual test case level, addressing traditional binary metrics' inability to distinguish when build failures cause all tests to fail. Evaluations using TRANSREPO-BENCH highlight key challenges and advance more accurate repository level code translation.