🤖 AI Summary
This study addresses the end-to-end neural decoding of EEG signals into natural language text for wearable, low-cost “thought-to-text” applications. We propose the first cross-modal framework that cascades instruction-tuned large language models (LLaMA-3, Mistral-0.3, Qwen2.5) with an EEG feature encoder, enabling direct EEG-to-text mapping without intermediate image reconstruction. Our method employs a three-stage progressive fine-tuning strategy, integrating multimodal alignment training with end-to-end optimization that projects EEG embeddings directly into the LLM’s textual semantic space. Evaluated on a six-subject public EEG dataset, our approach achieves statistically significant improvements over state-of-the-art baselines in BLEU, METEOR, and human evaluations (fluency and adequacy). This work is the first to empirically validate the efficacy and robustness of instruction-tuned LLMs for semantic EEG decoding, establishing a new paradigm for direct brain–language translation.
📝 Abstract
Decoding and expressing brain activity in a comprehensible form is a challenging frontier in AI. This paper presents Thought2Text, which uses instruction-tuned Large Language Models (LLMs) fine-tuned with EEG data to achieve this goal. The approach involves three stages: (1) training an EEG encoder for visual feature extraction, (2) fine-tuning LLMs on image and text data, enabling multimodal description generation, and (3) further fine-tuning on EEG embeddings to generate text directly from EEG during inference. Experiments on a public EEG dataset collected for six subjects with image stimuli and text captions demonstrate the efficacy of multimodal LLMs (LLaMA-v3, Mistral-v0.3, Qwen2.5), validated using traditional language generation evaluation metrics, as well as fluency and adequacy measures. This approach marks a significant advancement towards portable, low-cost"thoughts-to-text"technology with potential applications in both neuroscience and natural language processing.