🤖 AI Summary
This work addresses the limitations of current large language models in multilingual code translation, which heavily rely on parallel corpora and struggle to generalize to low-resource programming languages, while existing reinforcement learning reward mechanisms offer limited effectiveness. To overcome these challenges, the authors propose CodePivot, a novel unsupervised translation framework that leverages Python as an intermediate representation, enabling mutual translation among multiple languages without requiring paired parallel data. Furthermore, they introduce an Aggressive-Partial-Functional reinforcement learning reward mechanism that substantially enhances translation quality. Experimental results demonstrate that their 7B-parameter model outperforms state-of-the-art systems—despite being over two orders of magnitude smaller in scale—across ten programming languages, achieving superior performance in both Python-to-Others and Others-to-All translation tasks.
📝 Abstract
Transpilation, or code translation, aims to convert source code from one programming language (PL) to another. It is beneficial for many downstream applications, from modernizing large legacy codebases to augmenting data for low-resource PLs. Recent large language model (LLM)-based approaches have demonstrated immense potential for code translation. Among these approaches, training-based methods are particularly important because LLMs currently do not effectively adapt to domain-specific settings that suffer from a lack of knowledge without targeted training. This limitation is evident in transpilation tasks involving low-resource PLs. However, existing training-based approaches rely on a pairwise transpilation paradigm, making it impractical to support a diverse range of PLs. This limitation is particularly prominent for low-resource PLs due to a scarcity of training data. Furthermore, these methods suffer from suboptimal reinforcement learning (RL) reward formulations. To address these limitations, we propose CodePivot, a training framework that leverages Python as an intermediate representation (IR), augmented by a novel RL reward mechanism, Aggressive-Partial-Functional reward, to bootstrap the model's multilingual transpilation ability without requiring parallel corpora. Experiments involving 10 PLs show that the resulting 7B model, trained on Python-to-Others tasks, consistently improves performance across both general and low-resource PL-related transpilation tasks. It outperforms substantially larger mainstream models with hundreds of billions more parameters, such as Deepseek-R1 and Qwen3-235B-A22B-Instruct-2507, on Python-to-Others tasks and Others-to-All tasks, respectively. In addition, it outperforms its counterpart trained directly on Any-to-Any tasks on general transpilation tasks. The code and data are available at https://github.com/lishangyu-hkust/CodePivot.