The Artificial Scientist -- in-transit Machine Learning of Plasma Simulations

📅 2025-01-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the dual challenges of I/O bottlenecks and catastrophic forgetting in large-scale plasma simulations, this work introduces the Streaming AI Scientist framework, which pioneers an *in-transit* machine learning paradigm featuring tight simulation–learning coupling. It bypasses filesystem I/O via zero-copy in-memory data streaming, enabling real-time co-execution of simulation and ML training. An asynchronous feature transformation pipeline and an experience-replay–based continual learning mechanism mitigate catastrophic forgetting in non-stationary physical processes. The framework supports cross-language, zero-modification integration with existing simulation codes. Technically, it integrates GPU acceleration (PIConGPU), streaming pipelines, asynchronous memory transfers, and Frontier exascale supercomputer optimization. Evaluated on a thousand-GPU Kelvin–Helmholtz instability workflow on Frontier, it reduces I/O overhead by 90%, achieves storage-free, sub-second model updates, and enables online physical pattern recognition.

Technology Category

Application Category

📝 Abstract
Increasing HPC cluster sizes and large-scale simulations that produce petabytes of data per run, create massive IO and storage challenges for analysis. Deep learning-based techniques, in particular, make use of these amounts of domain data to extract patterns that help build scientific understanding. Here, we demonstrate a streaming workflow in which simulation data is streamed directly to a machine-learning (ML) framework, circumventing the file system bottleneck. Data is transformed in transit, asynchronously to the simulation and the training of the model. With the presented workflow, data operations can be performed in common and easy-to-use programming languages, freeing the application user from adapting the application output routines. As a proof-of-concept we consider a GPU accelerated particle-in-cell (PIConGPU) simulation of the Kelvin- Helmholtz instability (KHI). We employ experience replay to avoid catastrophic forgetting in learning from this non-steady process in a continual manner. We detail challenges addressed while porting and scaling to Frontier exascale system.
Problem

Research questions and friction points this paper is trying to address.

Big Data
Data Processing Speed
Continuous Learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Real-time Data Processing
GPU Accelerated Simulations
Experience Replay
🔎 Similar Papers
No similar papers found.
Jeffrey Kelling
Jeffrey Kelling
Helmholtz-Zentrum Dresden-Rossendorf (HZDR), Chemnitz University of Technology, Dresden, Germany
V
Vicente Bolea
Kitware Inc., Clifton Park, NY, United States of America
Michael Bussmann
Michael Bussmann
Center for Advanced Systems Understanding
matter under extreme conditionsaccelerator physicshigh performance computingartificial intelligencemedical physics
A
Ankush Checkervarty
Helmholtz-Zentrum Dresden-Rossendorf (HZDR), Dresden, Germany
A
A. Debus
Helmholtz-Zentrum Dresden-Rossendorf (HZDR), Dresden, Germany
Jan Ebert
Jan Ebert
Forschungszentrum Jülich GmbH
Computer scienceartificial intelligencemathematicsphysics
Greg Eisenhauer
Greg Eisenhauer
Georgia Institute of Technology, Atlanta, GA, United States of America
V
Vineeth Gutta
University of Delaware, Newark, DE, United States of America
Stefan Kesselheim
Stefan Kesselheim
Jülich Supercomputing Center, Jülich Research Centre
Machine LearningComputer Simulation MethodsStatistical Mechanics
S
S. Klasky
Oak Ridge National Laboratory, Oak Ridge, TN, United States of America
R
R. Pausch
Helmholtz-Zentrum Dresden-Rossendorf (HZDR), Dresden, Germany
N
N. Podhorszki
Oak Ridge National Laboratory, Oak Ridge, TN, United States of America
F
Franz Poschel
Center for Advance Systems Understanding (CASUS), Görlitz, Germany
David Rogers
David Rogers
Research Engineer, Vanderbilt University
J
Jeyhun Rustamov
Helmholtz-Zentrum Dresden-Rossendorf (HZDR), Dresden, Germany
S
Steve Schmerler
Helmholtz-Zentrum Dresden-Rossendorf (HZDR), Dresden, Germany
U
U. Schramm
Helmholtz-Zentrum Dresden-Rossendorf (HZDR), Dresden, Germany
K
K. Steiniger
Helmholtz-Zentrum Dresden-Rossendorf (HZDR), Center for Advance Systems Understanding (CASUS), Dresden, Germany
R
R. Widera
Helmholtz-Zentrum Dresden-Rossendorf (HZDR), Dresden, Germany
A
Anna Willmann
Helmholtz-Zentrum Dresden-Rossendorf (HZDR), Dresden, Germany
Sunita Chandrasekaran
Sunita Chandrasekaran
Associate Professor, Dept. of CIS, University of Delaware
High Performance ComputingParallel ProgrammingOpenMPOpenACCSupercomputing