Efficient dataset construction using active learning and uncertainty-aware neural networks for plasma turbulent transport surrogate models

📅 2025-07-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the high annotation cost and low sample efficiency in surrogate modeling of turbulent transport in tokamak plasmas, this work proposes an uncertainty-aware active learning framework. It integrates a spectral-normalized Gaussian process (SNGP) classifier with a Bayesian neural network–neural contextual policy (BNN-NCP) regressor, coupled to the QuaLiKiz physics-based simulator to establish a closed-loop online annotation mechanism. The framework supports multimodal input and joint modeling of multiple physical fields, enabling simultaneous optimization of predictive accuracy and data acquisition efficiency under dynamic incremental learning. After 45 active learning iterations, the training set expands from 10² to 10⁴ samples; on an independent test set, the model achieves F₁ ≈ 0.8 for turbulence mode classification and R² ≈ 0.75 for transport coefficient prediction. This significantly enhances high-dimensional physics-informed modeling performance under limited-data regimes, delivering a scalable, efficient, data-driven paradigm for real-time plasma control.

Technology Category

Application Category

📝 Abstract
This work demonstrates a proof-of-principle for using uncertainty-aware architectures, in combination with active learning techniques and an in-the-loop physics simulation code as a data labeller, to construct efficient datasets for data-driven surrogate model generation. Building off of a previous proof-of-principle successfully demonstrating training set reduction on static pre-labelled datasets, using the ADEPT framework, this strategy was applied again to the plasma turbulent transport problem within tokamak fusion plasmas, specifically the QuaLiKiz quasilinear electrostatic gyrokinetic turbulent transport code. While QuaLiKiz provides relatively fast evaluations, this study specifically targeted small datasets to serve as a proxy for more expensive codes, such as CGYRO or GENE. The newly implemented algorithm uses the SNGP architecture for the classification component of the problem and the BNN-NCP architecture for the regression component, training models for all turbulent modes (ITG, TEM, ETG) and all transport fluxes ($Q_e$, $Q_i$, $Γ_e$, $Γ_i$, and $Π_i$) described by the general QuaLiKiz output. With 45 active learning iterations, moving from a small initial training set of $10^{2}$ to a final set of $10^{4}$, the resulting models reached a $F_1$ classification performance of ~0.8 and a $R^2$ regression performance of ~0.75 on an independent test set across all outputs. This extrapolates to reaching the same performance and efficiency as the previous ADEPT pipeline, although on a problem with 1 extra input dimension. While the improvement rate achieved in this implementation diminishes faster than expected, the overall technique is formulated with components that can be upgraded and generalized to many surrogate modeling applications beyond plasma turbulent transport predictions.
Problem

Research questions and friction points this paper is trying to address.

Efficient dataset construction for plasma turbulent transport models
Active learning with uncertainty-aware neural networks
Surrogate model generation for tokamak fusion plasmas
Innovation

Methods, ideas, or system contributions that make the work stand out.

Active learning optimizes dataset construction efficiency
Uncertainty-aware neural networks enhance surrogate model accuracy
Physics simulation code serves as automated data labeller
🔎 Similar Papers
No similar papers found.