On Effective Semantic Translation for Code: A Study Based on Pseudocode

📅 2025-10-01

📈 Citations: 0

✨ Influential: 0

career value

171K/year

🤖 AI Summary

Code literal translation suffers from low accuracy in scenarios involving substantial cross-language disparities, limited target-language resources (e.g., Rust), or strict type systems. To address this, we propose a pseudocode-driven semantic translation paradigm: source code is first abstracted into language-agnostic, logically clear pseudocode, which is then translated into target-language code. This approach decouples program semantics from syntactic realization, mitigating interference from language-specific details. Empirical evaluation across 9,690 cross-lingual translation tasks spanning six programming languages—using five state-of-the-art large language models—demonstrates that pseudocode relaying significantly improves translation success rates, especially in low-resource and strongly-typed settings, while exhibiting complementary performance to direct translation. Both automated and human evaluations corroborate its effectiveness. This work provides the first systematic empirical validation of pseudocode as a structurally advantageous and practically valuable intermediate representation for code translation.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) show great potential in code translation. However, accurate translation remains challenging when using the commonly adopted direct code-to-code translation approach, which converts a program into the target programming language (PL) in a single step. Inspired by the success of incorporating intermediate steps to guide LLMs in resolving challenging tasks, we explore pseudocode-based code translation, which emulates the human semantic translation by first interpreting the program's intent and logic into pseudocode and then implementing it in the target PL. We find that pseudocode-based translation helps translate programs that direct translation struggles to handle. Nonetheless, the effectiveness, advantages, and limitations of this approach remain underexplored. To bridge this gap, we present an empirical study on pseudocode-based code translation, aiming to investigate its effectiveness in enhancing the direct translation approach, illuminate its effective usage, and identify limitations hindering its potential benefits. By comparing direct and pseudocode-based translation approaches on 9,690 translation tasks across six PLs with five popular LLMs, we demonstrate that pseudocode-based translation can effectively complement direct translation, particularly when translating from flexible to rigid PLs or dealing with low-resource Rust. Based on these findings, we suggest adopting strategies that combine the complementary strengths of both approaches to enhance code translation accuracy. We also reveal the advantages of pseudocode-based translation in disentangling translations of complicated programs and mitigating distractions from detailed implementations in original programs, as well as its limitations due to incorrect, incomplete, or ambiguous pseudocode.

Problem

Research questions and friction points this paper is trying to address.

Enhancing code translation accuracy using pseudocode as intermediate semantic representation

Investigating pseudocode-based approach to complement direct code-to-code translation

Addressing limitations of direct translation for flexible-to-rigid language conversions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses pseudocode as intermediate step for translation

Combines pseudocode-based and direct translation approaches

Translates program intent before target language implementation

🔎 Similar Papers

Exploring the Impact of the Output Format on the Evaluation of Large Language Models for Code Translation