NL in the Middle: Code Translation with LLMs and Intermediate Representations

📅 2025-07-11

📈 Citations: 0

✨ Influential: 0

career value

156K/year

🤖 AI Summary

To address the frequent generation of erroneous code by large language models (LLMs) in code translation tasks, this paper proposes a dual intermediate representation (IR) method that jointly leverages natural language (NL) summaries and abstract syntax tree (AST) structures, augmented with chain-of-thought (CoT) prompting to enhance LLMs’ understanding of semantic and syntactic constraints. Unlike conventional end-to-end approaches, our method first maps source code into a semantically explicit NL summary and explicitly encodes AST structural information, enabling stepwise, interpretable reasoning. Evaluated on OpenGPT-4 8×7B, StarCoder, and CodeGen across CodeNet and AVATAR benchmarks, it achieves +13.8% and +6.7% absolute improvements in translation success rate, respectively—outperforming all baselines. The core contribution lies in the first integration of NL summaries and ASTs as jointly interpretable and constraint-enforceable IRs, significantly improving translation accuracy, traceability, and controllability.

Technology Category

Application Category

📝 Abstract

Studies show that large language models (LLMs) produce buggy code translations. One avenue to improve translation accuracy is through intermediate representations, which could provide structured insights to guide the model's understanding. We explore whether code translation using LLMs can benefit from intermediate representations via natural language (NL) and abstract syntax trees (ASTs). Since prompt engineering greatly affects LLM performance, we consider several ways to integrate these representations, from one-shot to chain-of-thought (CoT) prompting. Using Open Gpt4 8X7B and specialized StarCoder and CodeGen models on popular code translation benchmarks (CodeNet and AVATAR), we find that CoT with an intermediate NL summary performs best, with an increase of 13.8% and 6.7%, respectively, in successful translations for the best-performing model (Open Gpt4 8X7B) compared to the zero-shot prompt.

Problem

Research questions and friction points this paper is trying to address.

Improving buggy code translations by LLMs using intermediate representations

Exploring NL and ASTs to enhance LLM-based code translation accuracy

Evaluating prompt engineering methods for optimal intermediate representation integration

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses intermediate NL and AST representations

Applies chain-of-thought prompting techniques

Achieves significant translation accuracy improvement

🔎 Similar Papers

Exploring the Impact of the Output Format on the Evaluation of Large Language Models for Code Translation