🤖 AI Summary
To address the challenge of decoding visual neural representations from low signal-to-noise ratio (SNR) EEG signals, this paper proposes a multimodal semantic-enhanced decoding framework. Methodologically: (1) it constructs a text-semantic-guided shared multimodal embedding space to align EEG, image, and text modalities; (2) it introduces a modality-consistency dynamic balancing strategy to adaptively weight each modality; and (3) it incorporates stochastic perturbation regularization and dynamic Gaussian noise, while employing adapters to fuse pretrained vision and language features for improved robustness. Evaluated on the ThingsEEG dataset, the framework achieves absolute improvements of 2.0% and 4.7% in Top-1 and Top-5 accuracy, respectively, surpassing state-of-the-art methods. Its core contribution lies in being the first to dynamically leverage textual semantics as prior knowledge to enhance EEG-based visual decoding, enabling noise-robust and modality-adaptive multimodal representation learning.
📝 Abstract
In this work, we propose an innovative framework that integrates EEG, image, and text data, aiming to decode visual neural representations from low signal-to-noise ratio EEG signals. Specifically, we introduce text modality to enhance the semantic correspondence between EEG signals and visual content. With the explicit semantic labels provided by text, image and EEG features of the same category can be more closely aligned with the corresponding text representations in a shared multimodal space. To fully utilize pre-trained visual and textual representations, we propose an adapter module that alleviates the instability of high-dimensional representation while facilitating the alignment and fusion of cross-modal features. Additionally, to alleviate the imbalance in multimodal feature contributions introduced by the textual representations, we propose a Modal Consistency Dynamic Balance (MCDB) strategy that dynamically adjusts the contribution weights of each modality. We further propose a stochastic perturbation regularization (SPR) term to enhance the generalization ability of semantic perturbation-based models by introducing dynamic Gaussian noise in the modality optimization process. The evaluation results on the ThingsEEG dataset show that our method surpasses previous state-of-the-art methods in both Top-1 and Top-5 accuracy metrics, improving by 2.0% and 4.7% respectively.