TokaMind: A Multi-Modal Transformer Foundation Model for Tokamak Plasma Dynamics

📅 2026-02-16

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

This work proposes TokaMind, a multimodal Transformer-based pre-trained foundation model designed to address key challenges in tokamak plasma modeling—namely, the heterogeneity of multimodal diagnostic data, inconsistent sampling rates, and missing signals. TokaMind introduces a novel training-free DCT3D embedding method that enables unified representation of diverse data modalities, including time series, 2D profiles, and video streams. The architecture features plug-and-play embedding interfaces and component-wise selective loading, facilitating efficient fine-tuning and transfer learning. Integrated with VAE-based surrogate embeddings and robust mechanisms for handling missing signals, TokaMind significantly outperforms baseline models on the TokaMark benchmark using the MAST dataset. Notably, lightweight fine-tuning of the pre-trained model surpasses performance achieved by training from scratch, demonstrating the efficacy and generalization capability of multimodal pre-training in fusion plasma diagnostics.

Technology Category

Application Category

📝 Abstract

We present TokaMind, an open-source foundation model framework for fusion plasma modeling, based on a Multi-Modal Transformer (MMT) and trained on heterogeneous tokamak diagnostics from the publicly available MAST dataset. TokaMind supports multiple data modalities (time-series, 2D profiles, and videos) with different sampling rates, robust missing-signal handling, and efficient task adaptation via selectively loading and freezing four model components. To represent multi-modal signals, we use a training-free Discrete Cosine Transform embedding (DCT3D) and provide a clean interface for alternative embeddings (e.g., Variational Autoencoders - VAEs). We evaluate TokaMind on the recently introduced MAST benchmark TokaMark, comparing training and embedding strategies. Our results show that fine-tuned TokaMind outperforms the benchmark baseline on all but one task, and that, for several tasks, lightweight fine-tuning yields better performance than training the same architecture from scratch under a matched epoch budget. These findings highlight the benefits of multi-modal pretraining for tokamak plasma dynamics and provide a practical, extensible foundation for future fusion modeling tasks. Training code and model weights will be made publicly available.

Problem

Research questions and friction points this paper is trying to address.

tokamak

plasma dynamics

multi-modal data

fusion modeling

heterogeneous diagnostics

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-Modal Transformer

DCT3D embedding

tokamak plasma dynamics