Learning Interpretable Representations Leads to Semantically Faithful EEG-to-Text Generation

📅 2025-05-21

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This study addresses semantic distortion and hallucination in non-invasive electroencephalography (EEG)-to-text generation. We propose a core-semantic-oriented abstractive decoding paradigm. For the first time, we model the information capacity mismatch between EEG and text from a posterior collapse perspective, and introduce the Generative Language Inspection Model (GLIM)—a teacher-forcing-free, interpretable framework for EEG representation learning and semantically faithful text generation. Our method integrates variational autoencoding, contrastive semantic alignment, cross-modal retrieval, and zero-shot classification into an end-to-end pipeline tailored to small-scale, heterogeneous EEG data. Evaluated on the ZuCo dataset, our approach generates fluent, EEG-verified texts and enables zero-shot classification across emotion, relation, and topic dimensions, as well as bidirectional EEG–text retrieval. It significantly enhances decoding reliability and evaluability.

Technology Category

Application Category

📝 Abstract

Pretrained generative models have opened new frontiers in brain decoding by enabling the synthesis of realistic texts and images from non-invasive brain recordings. However, the reliability of such outputs remains questionable--whether they truly reflect semantic activation in the brain, or are merely hallucinated by the powerful generative models. In this paper, we focus on EEG-to-text decoding and address its hallucination issue through the lens of posterior collapse. Acknowledging the underlying mismatch in information capacity between EEG and text, we reframe the decoding task as semantic summarization of core meanings rather than previously verbatim reconstruction of stimulus texts. To this end, we propose the Generative Language Inspection Model (GLIM), which emphasizes learning informative and interpretable EEG representations to improve semantic grounding under heterogeneous and small-scale data conditions. Experiments on the public ZuCo dataset demonstrate that GLIM consistently generates fluent, EEG-grounded sentences without teacher forcing. Moreover, it supports more robust evaluation beyond text similarity, through EEG-text retrieval and zero-shot semantic classification across sentiment categories, relation types, and corpus topics. Together, our architecture and evaluation protocols lay the foundation for reliable and scalable benchmarking in generative brain decoding.

Problem

Research questions and friction points this paper is trying to address.

Address hallucination in EEG-to-text decoding

Reframe decoding as semantic summarization

Improve semantic grounding with interpretable EEG representations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Generative Language Inspection Model (GLIM)

Semantic summarization of EEG data

EEG-text retrieval and classification

🔎 Similar Papers

Thought2Text: Text Generation from EEG Signal using Large Language Models (LLMs)

2024-10-10arXiv.orgCitations: 0

Authors to Follow