🤖 AI Summary
This work addresses the challenge of modernizing legacy Fortran code in Large Hadron Collider (LHC) simulation software within high-energy physics. We propose a trustworthy, generative-AI–driven code migration methodology targeting Fortran-to-C++ translation, Fortran–C interoperability API generation, and structured code analysis. Our novel CodeScribe framework integrates domain-aware prompt engineering, static-analysis–guided API contract modeling, real-time developer supervision, and human-in-the-loop verification. The approach overcomes the trustworthiness bottleneck of AI-generated code in scientific computing: it achieves high-fidelity translation of core modules, produces production-ready interoperable interfaces, improves code refactoring efficiency by over 3×, and reduces error rates to levels acceptable for manual review. By enabling rigorous validation and collaborative refinement, CodeScribe establishes a verifiable, collaborative paradigm for modernizing scientific software heritage.
📝 Abstract
The emergence of foundational models and generative artificial intelligence (GenAI) is poised to transform productivity in scientific computing, especially in code development, refactoring, and translating from one programming language to another. However, because the output of GenAI cannot be guaranteed to be correct, manual intervention remains necessary. Some of this intervention can be automated through task-specific tools, alongside additional methodologies for correctness verification and effective prompt development. We explored the application of GenAI in assisting with code translation, language interoperability, and codebase inspection within a legacy Fortran codebase used to simulate particle interactions at the Large Hadron Collider (LHC). In the process, we developed a tool, CodeScribe, which combines prompt engineering with user supervision to establish an efficient process for code conversion. In this paper, we demonstrate how CodeScribe assists in converting Fortran code to C++, generating Fortran-C APIs for integrating legacy systems with modern C++ libraries, and providing developer support for code organization and algorithm implementation. We also address the challenges of AI-driven code translation and highlight its benefits for enhancing productivity in scientific computing workflows.