๐ค AI Summary
This study addresses the challenge of improving decoding accuracy and generalizability of visual images from neural signals to advance brainโcomputer interfaces and deepen understanding of human visual perception. We propose the first human-perception-aligned image encoder, which directly maps multimodal neural signals (e.g., EEG, fMRI) into a semantically consistent visual representation space via cross-modal representation alignment, enabling end-to-end image decoding. Our key contribution is the integration of human behavioral and neural response constraints into the image encoder, substantially enhancing its capacity to model perceptual representations under rapid visual stimulation. Experiments demonstrate up to a 21% improvement in zero-shot image retrieval accuracy. Moreover, the method exhibits robust performance gains across diverse EEG architectures, image encoders, alignment strategies, subjects, and neuroimaging modalities (EEG and fMRI).
๐ Abstract
Decoding visual images from brain activity has significant potential for advancing brain-computer interaction and enhancing the understanding of human perception. Recent approaches align the representation spaces of images and brain activity to enable visual decoding. In this paper, we introduce the use of human-aligned image encoders to map brain signals to images. We hypothesize that these models more effectively capture perceptual attributes associated with the rapid visual stimuli presentations commonly used in visual brain data recording experiments. Our empirical results support this hypothesis, demonstrating that this simple modification improves image retrieval accuracy by up to 21% compared to state-of-the-art methods. Comprehensive experiments confirm consistent performance improvements across diverse EEG architectures, image encoders, alignment methods, participants, and brain imaging modalities.