Enhancing LLM-based Code Translation in Repository Context via Triple Knowledge-Augmented

📅 2025-03-24

📈 Citations: 0

✨ Influential: 0

career value

164K/year

🤖 AI Summary

Large language models (LLMs) exhibit limited performance in warehouse-scale cross-lingual code translation due to difficulties in modeling complex inter-procedural dependencies and contextual relationships, hindering industrial deployment. To address this, we propose a triple-knowledge-enhanced framework that jointly integrates target codebase semantics, source repository structural information, and historically successful function-pair translations; we further introduce a novel dependency-aware exemplar prompting mechanism. Additionally, we design an LLM-based self-debugging module and a knowledge-base self-evolution module to enable context-aware prompt construction and dynamic knowledge refinement. Experimental results demonstrate significant improvements over strong baselines: +19.4% in pass@1 and +0.138 in CodeBLEU. Ablation studies confirm that dependency exemplars yield the largest gains. Moreover, iterative knowledge-base updates consistently enhance translation accuracy and contextual coherence across successive refinements.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) have behaved well in function-level code translation without repository-level context. However, the performance of LLMs in repository-level context code translation remains suboptimal due to complex dependencies and context, hindering their adoption in industrial settings. In this work, we propose a novel LLM-based code translation technique K-Trans, which leverages triple knowledge augmentation to enhance LLM's translation quality under repository context in real-world software development. First, K-Trans constructs a translation knowledge base by extracting relevant information from target-language codebases, the repository being translated, and prior translation results. Second, for each function to be translated, K-Trans retrieves relevant triple knowledge, including target-language code samples, dependency usage examples, and successful translation function pairs, serving as references to enhance LLM for translation. Third, K-Trans constructs a knowledge-augmented translation prompt using the retrieved triple knowledge and employs LLMs to generate the translated code while preserving repository context. It further leverages LLMs for self-debugging, enhancing translation correctness. The experiments show that K-Trans substantially outperforms the baseline adapted from previous work by 19.4%/40.2% relative improvement in pass@1 and 0.138 in CodeBLEU. It is important to note that the results also demonstrate that each knowledge significantly contributes to K-Trans's effectiveness in handling repository-level context code translation, with dependency usage examples making the most notable contribution. Moreover, as the self-evolution process progresses, the knowledge base continuously enhances the LLM's performance across various aspects of the repository-level code translation.

Problem

Research questions and friction points this paper is trying to address.

Improves LLM-based code translation in repository contexts

Addresses suboptimal performance due to complex dependencies

Enhances translation quality via triple knowledge augmentation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Triple knowledge augmentation enhances LLM translation

Retrieves target samples and dependencies for context

Self-debugging LLM improves translation correctness

🔎 Similar Papers

No similar papers found.