🤖 AI Summary
This study addresses the challenge of limited labeled data in clinical brain-to-text interfaces for paralyzed patients by proposing a long-context pre-trained brain-to-text decoding model. The model is pre-trained on magnetoencephalography (MEG) sequences spanning up to 2.5 minutes—equivalent to 191k tokens—enabling it to leverage neural signal context at a scale 5 to 300 times greater than previous approaches. Remarkably, with only one hour of annotated data, the method achieves performance comparable to conventional supervised models trained on 50 hours of data. Furthermore, it surpasses existing brain foundation models in word-level decoding tasks, substantially improving both data efficiency and decoding accuracy.
📝 Abstract
Clinical brain-to-text interfaces are designed for paralysed patients who cannot provide extensive training recordings. Pre-training improves data-efficient generalisation by learning statistical priors across subjects, but these priors critically depend on context. While natural speech might unfold gradually over minutes, most methods pre-train with only a few seconds of context. Thus, we propose MEG-XL, a model pre-trained with 2.5 minutes of MEG context per sample, 5-300x longer than prior work, and equivalent to 191k tokens, capturing extended neural context. Fine-tuning on the task of word decoding from brain data, MEG-XL matches supervised performance with a fraction of the data (e.g. 1hr vs 50hrs) and outperforms brain foundation models. We find that models pre-trained with longer contexts learn representations that transfer better to word decoding. Our results indicate that long-context pre-training helps exploit extended neural context that other methods unnecessarily discard. Code, model weights, and instructions are available at https://github.com/neural-processing-lab/MEG-XL .