🤖 AI Summary
To address the performance degradation in code translation between low-resource programming languages (e.g., Fortran) and emerging parallel frameworks (e.g., CUDA), caused by the scarcity of high-quality parallel corpora, this paper proposes a conversational, LLM-based automated data generation framework grounded in a dual-LLM question-answering mechanism. The framework employs a collaborative Questioner-Solver architecture that integrates compiler analysis, runtime execution feedback, and unit test validation to generate functionally verifiable translation pairs enriched with multi-step reasoning traces. Unlike conventional source–target code-pair paradigms, our approach significantly improves functional consistency and reliability. Evaluated on C++→CUDA translation, fine-tuning a 7B open-weight model with our generated data yields a >56% improvement in unit test pass rate; key metrics—including compilation success rate—surpass those of larger proprietary commercial systems.
📝 Abstract
Large language models (LLMs) have shown remarkable capabilities in code translation, yet their performance deteriorates in low-resource programming domains such as Fortran and emerging frameworks like CUDA, where high-quality parallel data are scarce. We present an automated dataset generation pipeline featuring a dual-LLM Questioner-Solver design that incorporates external knowledge from compilers and runtime feedback. Beyond traditional source-target code pair datasets, our approach additionally generates (1) verified translations with unit tests for assessing functional consistency, and (2) multi-turn dialogues that capture the reasoning process behind translation refinement. Applied to Fortran ->C++ and C++ ->CUDA, the pipeline yields 3.64k and 3.93k dialogues, respectively. Fine-tuning on this data yields dramatic improvements in functional correctness, boosting unit test success rates by over 56% on the challenging C++-to-CUDA task. We show this data enables a 7B open-weight model to significantly outperform larger proprietary systems on key metrics like compilation success.