🤖 AI Summary
This work addresses the challenge that large language models struggle to simultaneously ensure syntactic correctness and semantic consistency in code translation, compounded by the unreliability of existing semantic rewards used in preference learning. The authors propose CTO, a novel approach that, for the first time, leverages cross-lingual contrastive learning to construct a functional equivalence evaluator between source and target code. This evaluator is integrated with compiler-derived syntactic feedback into a multi-objective preference optimization framework. Evaluated on bidirectional translation tasks among C++, Java, and Python, CTO significantly outperforms current baselines and alternative preference optimization strategies, demonstrably improving both syntactic validity and semantic fidelity of the translated code.
📝 Abstract
LLMs have shown immense potential for code translation, yet they often struggle to ensure both syntactic correctness and semantic consistency. While preference-based learning offers a promising alignment strategy, it is hindered by unreliable semantic rewards derived from sparse test cases or restrictive reference translations. We argue that a robust semantic reward for code translation must be derived directly from the source code. In this paper, we propose CTO to improve code translation with syntax-guided and semantic-aware preference optimization. Through contrastive learning, we train a cross-lingual semantic model to directly assess functional equivalence between source and translated code. By formulating code translation as a multi-objective optimization problem, this robust semantic signal is seamlessly unified with compiler-based syntactic feedback within the direct preference optimization framework. Extensive experiments on C++, Java, and Python translations demonstrate that CTO significantly outperforms existing baselines and alternative preference optimization strategies.