🤖 AI Summary
Current non-invasive BCI-based language decoding faces three key bottlenecks: underutilization of magnetoencephalography (MEG) signals, poor cross-sentence generalization, and absence of multimodal fusion. Method: We propose the first end-to-end, multi-aligned MEG-to-text framework for natural language reconstruction from entirely unseen sentences. Our approach introduces a Transformer-based architecture that jointly aligns neural time series, phonemes, and semantics, integrating self-supervised pretraining with cross-modal contrastive learning to systematically unify speech, semantic, and dynamic temporal information. Results: On the Gwilliams dataset, our method achieves a BLEU-1 score of 10.44—improving by 4.95 (+93%) over the strongest baseline—demonstrating substantially enhanced open-vocabulary text generation capability. This work breaks critical limitations in generalizability and multimodal integration for non-invasive brain–computer interface–based language reconstruction.
📝 Abstract
Deciphering language from brain activity is a crucial task in brain-computer interface (BCI) research. Non-invasive cerebral signaling techniques including electroencephalography (EEG) and magnetoencephalography (MEG) are becoming increasingly popular due to their safety and practicality, avoiding invasive electrode implantation. However, current works under-investigated three points: 1) a predominant focus on EEG with limited exploration of MEG, which provides superior signal quality; 2) poor performance on unseen text, indicating the need for models that can better generalize to diverse linguistic contexts; 3) insufficient integration of information from other modalities, which could potentially constrain our capacity to comprehensively understand the intricate dynamics of brain activity. This study presents a novel approach for translating MEG signals into text using a speech-decoding framework with multiple alignments. Our method is the first to introduce an end-to-end multi-alignment framework for totally unseen text generation directly from MEG signals. We achieve an impressive BLEU-1 score on the $ extit{GWilliams}$ dataset, significantly outperforming the baseline from 5.49 to 10.44 on the BLEU-1 metric. This improvement demonstrates the advancement of our model towards real-world applications and underscores its potential in advancing BCI research. Code is available at $href{https://github.com/NeuSpeech/MAD-MEG2text}{https://github.com/NeuSpeech/MAD-MEG2text}$.