🤖 AI Summary
Addressing challenges in automatic L/H-mode and intermediate oscillatory (D-state) classification for tokamak plasmas—including ambiguous transition-region labeling and poor robustness under missing/abnormal signals—this paper proposes a dual-axis integrated framework. It couples a recurrent Fourier Neural Operator (FNO) for temporal-spectral modeling with gradient-boosting trees (XGBoost/LightGBM), jointly leveraging multi-source diagnostic feature grouping and ensemble learning, while supporting uncertainty quantification (ECE < 0.03) and fault-signal tolerance. We release the first open-source, fully annotated TCV dataset comprising 302 discharges. The method achieves Cohen’s kappa ≥ 0.89 for L/D/H-state classification and demonstrates strong robustness under signal dropout, out-of-distribution conditions, and boundary transition regions. Uncertainty estimates are validated via conditional calibration and exhibit physical consistency with plasma behavior.
📝 Abstract
Maximizing fusion performance in tokamaks relies on high energy confinement, often achieved through distinct operating regimes. The automated labeling of these confinement states is crucial to enable large-scale analyses or for real-time control applications. While this task becomes difficult to automate near state transitions or in marginal scenarios, much success has been achieved with data-driven models. However, these methods generally provide predictions as point estimates, and cannot adequately deal with missing and/or broken input signals. To enable wide-range applicability, we develop methods for confinement state classification with uncertainty quantification and model robustness. We focus on off-line analysis for TCV discharges, distinguishing L-mode, H-mode, and an in-between dithering phase (D). We propose ensembling data-driven methods on two axes: model formulations and feature sets. The former considers a dynamic formulation based on a recurrent Fourier Neural Operator-architecture and a static formulation based on gradient-boosted decision trees. These models are trained using multiple feature groupings categorized by diagnostic system or physical quantity. A dataset of 302 TCV discharges is fully labeled, and will be publicly released. We evaluate our method quantitatively using Cohen's kappa coefficient for predictive performance and the Expected Calibration Error for the uncertainty calibration. Furthermore, we discuss performance using a variety of common and alternative scenarios, the performance of individual components, out-of-distribution performance, cases of broken or missing signals, and evaluate conditionally-averaged behavior around different state transitions. Overall, the proposed method can distinguish L, D and H-mode with high performance, can cope with missing or broken signals, and provides meaningful uncertainty estimates.