Fast Spectrogram Event Extraction via Offline Self-Supervised Learning: From Fusion Diagnostics to Bioacoustics

📅 2026-02-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of efficiently extracting coherent, quasi-coherent, and transient modes from petabyte-scale, high-noise time-frequency signals generated daily by next-generation fusion devices such as ITER, where traditional manual analysis proves inadequate. The authors propose a novel “signal-first” self-supervised framework that uniquely integrates offline self-supervised learning with nonlinear optimal multichannel signal processing and incorporates a fast neural network surrogate model. This approach automatically identifies key plasma events from multimodal diagnostic data—including magnetic probes, electron cyclotron emission, CO₂ interferometry, and beam emission spectroscopy—with a demonstrated inference latency of only 0.5 seconds. Validated on DIII-D, TJ-II, and non-fusion spectrogram datasets, the method exhibits strong cross-domain generalizability, enabling real-time plasma control and large-scale automated data analysis.

Technology Category

Application Category

📝 Abstract
Next-generation fusion facilities like ITER face a"data deluge,"generating petabytes of multi-diagnostic signals daily that challenge manual analysis. We present a"signals-first"self-supervised framework for the automated extraction of coherent and transient modes from high-noise time-frequency data. We also develop a general-purpose method and tool for extracting coherent, quasi-coherent, and transient modes for fluctuation measurements in tokamaks by employing non-linear optimal techniques in multichannel signal processing with a fast neural network surrogate on fast magnetics, electron cyclotron emission, CO2 interferometers, and beam emission spectroscopy measurements from DIII-D. Results are tested on data from DIII-D, TJ-II, and non-fusion spectrograms. With an inference latency of 0.5 seconds, this framework enables real-time mode identification and large-scale automated database generation for advanced plasma control. Repository is in https://github.com/PlasmaControl/TokEye.
Problem

Research questions and friction points this paper is trying to address.

data deluge
coherent modes
transient modes
time-frequency data
automated extraction
Innovation

Methods, ideas, or system contributions that make the work stand out.

self-supervised learning
time-frequency analysis
multichannel signal processing
neural network surrogate
real-time mode extraction
N
Nathaniel Chen
K
Kouroche Bouchiat
P
Peter Steiner
A
Andrew Rothstein
D
David Smith
M
Max Austin
M
Mike van Zeeland
Azarakhsh Jalalvand
Azarakhsh Jalalvand
Researcher at Princeton University
Egemen Kolemen
Egemen Kolemen
Princeton University
Plasma Control