🤖 AI Summary
This work addresses the challenges of multilingual code translation, which are primarily constrained by the scarcity of parallel data and imbalanced optimization across language pairs. The authors propose BootTrans, a novel approach that leverages test suites as verification oracles to enforce cross-lingual functional equivalence. BootTrans employs a dual-pool bootstrapping architecture—comprising a seed pool and an exploration pool—to iteratively expand high-quality training data in a guided manner. Additionally, it introduces a language-aware dynamic weighting strategy that adaptively adjusts training emphasis across different language pairs. Evaluated on the HumanEval-X and TransCoder-Test benchmarks, BootTrans substantially outperforms existing large language model baselines, achieving consistent and significant performance gains across all translation directions.
📝 Abstract
Code translation across multiple programming languages is essential yet challenging due to two vital obstacles: scarcity of parallel data paired with executable test oracles, and optimization imbalance when handling diverse language pairs. We propose BootTrans, a bootstrapping method that resolves both obstacles. Its key idea is to leverage the functional invariance and cross-lingual portability of test suites, adapting abundant pivot-language unit tests to serve as universal verification oracles for multilingual RL training. Our method introduces a dual-pool architecture with seed and exploration pools to progressively expand training data via execution-guided experience collection. Furthermore, we design a language-aware weighting mechanism that dynamically prioritizes harder translation directions based on relative performance across sibling languages, mitigating optimization imbalance. Extensive experiments on the HumanEval-X and TransCoder-Test benchmarks demonstrate substantial improvements over baseline LLMs across all translation directions, with ablations validating the effectiveness of both bootstrapping and weighting components.