🤖 AI Summary
This study addresses the modernization of legacy Fortran codes in high-performance computing (HPC) by systematically evaluating the applicability and accuracy of large language models (LLMs) for cross-language translation (Fortran → C++). We propose the first reproducible proxy-based translation evaluation framework, quantifying performance across four dimensions: compilation correctness, semantic fidelity (measured via CodeBLEU), numerical consistency, and cross-platform compatibility (x86/ARM). Evaluated on diverse scientific computing benchmarks using open-source LLMs, our approach achieves up to 89% compilation success rate, 76% average semantic similarity relative to human-authored translations, and >92% numerical consistency. Our key contribution is establishing the first multi-dimensional evaluation paradigm for LLM-based translation of scientific code, empirically validating both its feasibility and inherent limitations in realistic HPC environments.
📝 Abstract
Large Language Models (LLMs) are increasingly being leveraged for generating and translating scientific computer codes by both domain-experts and non-domain experts. Fortran has served as one of the go to programming languages in legacy high-performance computing (HPC) for scientific discoveries. Despite growing adoption, LLM-based code translation of legacy code-bases has not been thoroughly assessed or quantified for its usability. Here, we studied the applicability of LLM-based translation of Fortran to C++ as a step towards building an agentic-workflow using open-weight LLMs on two different computational platforms. We statistically quantified the compilation accuracy of the translated C++ codes, measured the similarity of the LLM translated code to the human translated C++ code, and statistically quantified the output similarity of the Fortran to C++ translation.