WaveMind: Towards a Conversational EEG Foundation Model Aligned to Textual and Visual Modalities

📅 2025-09-26

📈 Citations: 0

✨ Influential: 0

career value

238K/year

🤖 AI Summary

EEG signals simultaneously encode both cognitive processes and intrinsic neural states, leading to cross-modal representation mismatch in neurosemantic modeling. Method: We propose a dialogue-enabled foundational EEG model that exploits the semantic complementarity between these two components, implementing a unified cross-modal semantic mapping mechanism integrating EEG time-frequency encoding, text–vision alignment learning, and instruction tuning. Contribution/Results: We release WaveMind-Instruct-338k—the first instruction-tuning-oriented, cross-task EEG dataset (338K samples). Experiments demonstrate significant improvements in classification accuracy across four downstream tasks, enabling open-domain brain signal understanding and natural language interaction. This work establishes the first deep integration of EEG with multimodal large language models, introducing a novel paradigm for general-purpose neural decoding and interpretable brain–computer interfaces.

Technology Category

Application Category

📝 Abstract

Electroencephalography (EEG) interpretation using multimodal large language models (MLLMs) offers a novel approach for analyzing brain signals. However, the complex nature of brain activity introduces critical challenges: EEG signals simultaneously encode both cognitive processes and intrinsic neural states, creating a mismatch in EEG paired-data modality that hinders effective cross-modal representation learning. Through a pivot investigation, we uncover complementary relationships between these modalities. Leveraging this insight, we propose mapping EEG signals and their corresponding modalities into a unified semantic space to achieve generalized interpretation. To fully enable conversational capabilities, we further introduce WaveMind-Instruct-338k, the first cross-task EEG dataset for instruction tuning. The resulting model demonstrates robust classification accuracy while supporting flexible, open-ended conversations across four downstream tasks, thereby offering valuable insights for both neuroscience research and the development of general-purpose EEG models.

Problem

Research questions and friction points this paper is trying to address.

Aligns EEG signals with textual and visual modalities

Resolves mismatch in EEG paired-data modality representation

Enables conversational EEG interpretation through unified semantic mapping

Innovation

Methods, ideas, or system contributions that make the work stand out.

Mapping EEG signals into unified semantic space

Introducing first cross-task EEG instruction dataset

Enabling conversational capabilities across multiple tasks

🔎 Similar Papers

Thought2Text: Text Generation from EEG Signal using Large Language Models (LLMs)