Interpretable EEG-to-Image Generation with Semantic Prompts

📅 2025-07-09

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

This study addresses the limited spatial resolution of EEG signals, which hinders interpretability in visual decoding. We propose a semantic-prompt-mediated cross-modal decoding framework: leveraging hierarchical semantic descriptions generated by large language models as intermediaries, we align EEG representations with the semantic space via a Transformer-based EEG encoder, contrastive learning, and a projection head, subsequently driving a pretrained latent diffusion model to synthesize images. To our knowledge, this is the first method enabling interpretable, fine-grained mapping from EEG signals to semantic prompts, revealing the semantic topography of scalp potentials. Evaluated on the EEGCVPR dataset, our approach achieves state-of-the-art performance, significantly improving alignment between generated images and semantic content (+12.7% CLIP Score) and enhancing neuroscientific interpretability of the underlying neural mechanisms.

Technology Category

Application Category

📝 Abstract

Decoding visual experience from brain signals offers exciting possibilities for neuroscience and interpretable AI. While EEG is accessible and temporally precise, its limitations in spatial detail hinder image reconstruction. Our model bypasses direct EEG-to-image generation by aligning EEG signals with multilevel semantic captions -- ranging from object-level to abstract themes -- generated by a large language model. A transformer-based EEG encoder maps brain activity to these captions through contrastive learning. During inference, caption embeddings retrieved via projection heads condition a pretrained latent diffusion model for image generation. This text-mediated framework yields state-of-the-art visual decoding on the EEGCVPR dataset, with interpretable alignment to known neurocognitive pathways. Dominant EEG-caption associations reflected the importance of different semantic levels extracted from perceived images. Saliency maps and t-SNE projections reveal semantic topography across the scalp. Our model demonstrates how structured semantic mediation enables cognitively aligned visual decoding from EEG.

Problem

Research questions and friction points this paper is trying to address.

Decoding visual experience from EEG signals

Overcoming EEG's spatial detail limitations for image reconstruction

Aligning EEG signals with multilevel semantic captions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Aligns EEG signals with multilevel semantic captions

Uses transformer-based EEG encoder for contrastive learning

Conditions latent diffusion model via caption embeddings

🔎 Similar Papers

Guess What I Think: Streamlined EEG-to-Image Generation with Latent Diffusion Models