Leveraging AI for Productive and Trustworthy HPC Software: Challenges and Research Directions

📅 2025-05-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the challenge of deploying AI technologies in high-performance computing (HPC) software development. It systematically identifies lifecycle bottlenecks hindering large language model (LLM) adoption—specifically in scientific computing semantic understanding, trustworthiness assurance, and performance portability. To overcome these, we propose a “scientific computing semantic constraint–driven” AI-augmented development paradigm, integrating program semantic modeling, domain-knowledge injection, LLM-assisted code generation, and formal verification. We further introduce the first AI-readiness benchmark tailored for HPC. The framework has been deployed in two major U.S. Department of Energy projects—Ellora and Durban—demonstrating measurable improvements in HPC software development efficiency and reliability. Our work establishes six core research directions for AI–HPC co-development and delivers both a methodological foundation and infrastructure support for building trustworthy, formally verifiable, and high-performance AI-native scientific software.

Technology Category

Application Category

📝 Abstract
We discuss the challenges and propose research directions for using AI to revolutionize the development of high-performance computing (HPC) software. AI technologies, in particular large language models, have transformed every aspect of software development. For its part, HPC software is recognized as a highly specialized scientific field of its own. We discuss the challenges associated with leveraging state-of-the-art AI technologies to develop such a unique and niche class of software and outline our research directions in the two US Department of Energy--funded projects for advancing HPC Software via AI: Ellora and Durban.
Problem

Research questions and friction points this paper is trying to address.

Using AI to improve HPC software development
Addressing challenges in AI for niche HPC applications
Proposing research directions for trustworthy AI in HPC
Innovation

Methods, ideas, or system contributions that make the work stand out.

Using AI to develop HPC software
Applying large language models in HPC
Advancing HPC via AI projects Ellora and Durban
🔎 Similar Papers
No similar papers found.
Keita Teranishi
Keita Teranishi
Oak Ridge National Laboratory
high performance computing
Harshitha Menon
Harshitha Menon
Lawrence Livermore Nationa Lab
Parallel Computing
W
William F. Godoy
Oak Ridge National Laboratory, Oak Ridge, Tennessee, USA
Prasanna Balaprakash
Prasanna Balaprakash
Director of AI Programs and Distinguished R&D Staff Scientist, Oak Ridge National Laboratory
AI for ScienceScientific Machine LearningHigh Performance Computing
David Bau
David Bau
Assistant Professor at Northeastern University
Machine LearningComputer VisionNLPSoftware EngineeringHCI
Tal Ben-Nun
Tal Ben-Nun
Lawrence Livermore National Laboratory
High Performance ComputingParallel and Distributed AlgorithmsProgramming ModelsMachine Learning
Abhinav Bhatele
Abhinav Bhatele
Associate Professor of Computer Science, University of Maryland, College Park
Parallel Systems and SoftwareDistributed AIHPCMLforSys
Franz Franchetti
Franz Franchetti
Carnegie Mellon University
Autotuningcompilerscomputer architecture 3D memoryparallel architecturesperformance engineering
M
Michael Franusich
SpiralGen Inc., Pittsburgh, Pennsylvania, USA
Todd Gamblin
Todd Gamblin
Lawrence Livermore National Laboratory
hpcparallel computingperformancedependency managementdeveloper tools
Giorgis Georgakoudis
Giorgis Georgakoudis
Lawrence Livermore National Laboratory, Computer Scientist
Computer ScienceHigh Performance ComputingParallel ProgrammingFault ToleranceNetwork
Tom Goldstein
Tom Goldstein
Volpi-Cupal Professor of Computer Science, University of Maryland
Numerical OptimizationMachine LearningDistributed ComputingComputer Vision
Arjun Guha
Arjun Guha
Northeastern University
Programming Languages
S
Steven Hahn
Oak Ridge National Laboratory, Oak Ridge, Tennessee, USA
Costin Iancu
Costin Iancu
Senior Staff Scientist, Lawrence Berkeley National Laboratory
quantum synthesislarge scaleoptimization
Zheming Jin
Zheming Jin
Oak Ridge National Lab
Heterogeneous computing
T
Terry Jones
Oak Ridge National Laboratory, Oak Ridge, Tennessee, USA
T
Tze-Meng Low
Carnegie Mellon University, Pittsburgh, Pennsylvania, USA
H
Het Mankad
Oak Ridge National Laboratory, Oak Ridge, Tennessee, USA
Narasinga Rao Miniskar
Narasinga Rao Miniskar
Oak Ridge National Laboratory
NeuromorphicHeterogeneous computingDeep LearningCompilersCGRA
Mohammad Alaul Haque Monil
Mohammad Alaul Haque Monil
Research Scientist, Advanced Computing Systems Research section, Oak Ridge National Laboratory
High Performance ComputingHeterogeneous Systems Performance measurement and Analysis of Task Based RuntimesEnergy aware clou
Daniel Nichols
Daniel Nichols
Doctoral Student, University of Maryland, College Park
computer sciencehigh performance computingdeep learning
Konstantinos Parasyris
Konstantinos Parasyris
Lawrence Livermore National Lab
Approximate computingcompilersruntime systemsoperating systemsGPGPU
Swaroop Pophale
Swaroop Pophale
ORNL
P
Pedro Valero-Lara
Oak Ridge National Laboratory, Oak Ridge, Tennessee, USA
Jeffrey S. Vetter
Jeffrey S. Vetter
Oak Ridge National Laboratory
high performance computing
S
Samuel Williams
Lawrence Berkeley National Laboratory, Berkeley, California, USA
Aaron Young
Aaron Young
Georgia Institute of Technology
Wearable RoboticsBiomechanicsDeep LearningExoskeletonsProsthetics