🤖 AI Summary
This study systematically reviews recent advances in integrating EEG signals with generative AI, focusing on the core challenge of cross-modal generation—transforming neural activity into images, text, and speech. Methodologically, it comprehensively unifies GANs, VAEs, diffusion models, and Transformer- and contrastive-learning-based frameworks, constructing a structured technical taxonomy spanning datasets, feature encoding strategies, evaluation metrics, and application scenarios. A key contribution is the formal proposal and conceptualization of EEG-to-speech generation as a novel research direction. The work identifies critical performance bottlenecks—including low signal-to-noise ratio, high inter-subject variability, and severe label scarcity—and advocates for transferable representation learning as a viable pathway forward. Collectively, these findings provide foundational methodological guidance for advancing neural decoding theory, enabling practical assistive communication technologies, and accelerating the real-world deployment of brain–computer interfaces.
📝 Abstract
Integration of Brain-Computer Interfaces (BCIs) and Generative Artificial Intelligence (GenAI) has opened new frontiers in brain signal decoding, enabling assistive communication, neural representation learning, and multimodal integration. BCIs, particularly those leveraging Electroencephalography (EEG), provide a non-invasive means of translating neural activity into meaningful outputs. Recent advances in deep learning, including Generative Adversarial Networks (GANs) and Transformer-based Large Language Models (LLMs), have significantly improved EEG-based generation of images, text, and speech. This paper provides a literature review of the state-of-the-art in EEG-based multimodal generation, focusing on (i) EEG-to-image generation through GANs, Variational Autoencoders (VAEs), and Diffusion Models, and (ii) EEG-to-text generation leveraging Transformer based language models and contrastive learning methods. Additionally, we discuss the emerging domain of EEG-to-speech synthesis, an evolving multimodal frontier. We highlight key datasets, use cases, challenges, and EEG feature encoding methods that underpin generative approaches. By providing a structured overview of EEG-based generative AI, this survey aims to equip researchers and practitioners with insights to advance neural decoding, enhance assistive technologies, and expand the frontiers of brain-computer interaction.