🤖 AI Summary
Large language models (LLMs) exhibit limited performance in warehouse-scale cross-lingual code translation due to difficulties in modeling complex inter-procedural dependencies and contextual relationships, hindering industrial deployment. To address this, we propose a triple-knowledge-enhanced framework that jointly integrates target codebase semantics, source repository structural information, and historically successful function-pair translations; we further introduce a novel dependency-aware exemplar prompting mechanism. Additionally, we design an LLM-based self-debugging module and a knowledge-base self-evolution module to enable context-aware prompt construction and dynamic knowledge refinement. Experimental results demonstrate significant improvements over strong baselines: +19.4% in pass@1 and +0.138 in CodeBLEU. Ablation studies confirm that dependency exemplars yield the largest gains. Moreover, iterative knowledge-base updates consistently enhance translation accuracy and contextual coherence across successive refinements.
📝 Abstract
Large language models (LLMs) have behaved well in function-level code translation without repository-level context. However, the performance of LLMs in repository-level context code translation remains suboptimal due to complex dependencies and context, hindering their adoption in industrial settings. In this work, we propose a novel LLM-based code translation technique K-Trans, which leverages triple knowledge augmentation to enhance LLM's translation quality under repository context in real-world software development. First, K-Trans constructs a translation knowledge base by extracting relevant information from target-language codebases, the repository being translated, and prior translation results. Second, for each function to be translated, K-Trans retrieves relevant triple knowledge, including target-language code samples, dependency usage examples, and successful translation function pairs, serving as references to enhance LLM for translation. Third, K-Trans constructs a knowledge-augmented translation prompt using the retrieved triple knowledge and employs LLMs to generate the translated code while preserving repository context. It further leverages LLMs for self-debugging, enhancing translation correctness. The experiments show that K-Trans substantially outperforms the baseline adapted from previous work by 19.4%/40.2% relative improvement in pass@1 and 0.138 in CodeBLEU. It is important to note that the results also demonstrate that each knowledge significantly contributes to K-Trans's effectiveness in handling repository-level context code translation, with dependency usage examples making the most notable contribution. Moreover, as the self-evolution process progresses, the knowledge base continuously enhances the LLM's performance across various aspects of the repository-level code translation.