🤖 AI Summary
This work proposes TokaMind, a multimodal Transformer-based pre-trained foundation model designed to address key challenges in tokamak plasma modeling—namely, the heterogeneity of multimodal diagnostic data, inconsistent sampling rates, and missing signals. TokaMind introduces a novel training-free DCT3D embedding method that enables unified representation of diverse data modalities, including time series, 2D profiles, and video streams. The architecture features plug-and-play embedding interfaces and component-wise selective loading, facilitating efficient fine-tuning and transfer learning. Integrated with VAE-based surrogate embeddings and robust mechanisms for handling missing signals, TokaMind significantly outperforms baseline models on the TokaMark benchmark using the MAST dataset. Notably, lightweight fine-tuning of the pre-trained model surpasses performance achieved by training from scratch, demonstrating the efficacy and generalization capability of multimodal pre-training in fusion plasma diagnostics.
📝 Abstract
We present TokaMind, an open-source foundation model framework for fusion plasma modeling, based on a Multi-Modal Transformer (MMT) and trained on heterogeneous tokamak diagnostics from the publicly available MAST dataset. TokaMind supports multiple data modalities (time-series, 2D profiles, and videos) with different sampling rates, robust missing-signal handling, and efficient task adaptation via selectively loading and freezing four model components. To represent multi-modal signals, we use a training-free Discrete Cosine Transform embedding (DCT3D) and provide a clean interface for alternative embeddings (e.g., Variational Autoencoders - VAEs). We evaluate TokaMind on the recently introduced MAST benchmark TokaMark, comparing training and embedding strategies. Our results show that fine-tuned TokaMind outperforms the benchmark baseline on all but one task, and that, for several tasks, lightweight fine-tuning yields better performance than training the same architecture from scratch under a matched epoch budget. These findings highlight the benefits of multi-modal pretraining for tokamak plasma dynamics and provide a practical, extensible foundation for future fusion modeling tasks. Training code and model weights will be made publicly available.