🤖 AI Summary
This work addresses the challenge of decoding visual perception and semantic information from electroencephalography (EEG) signals to reconstruct corresponding images. We propose an end-to-end linear decoding framework that leverages EEG’s high temporal resolution to directly map preprocessed EEG traces into multi-level image representation spaces—including the CLIP semantic space, Stable Diffusion latent space, and low-level feature space—to condition a frozen diffusion model for image generation. Without complex neural architectures, our approach achieves hierarchical spatiotemporal disentanglement of visual processing via cross-modal linear projection and backward feature attribution. Our method achieves state-of-the-art (SOTA) reconstruction fidelity. It is the first to isolate distinct spatiotemporal EEG patterns selectively associated with semantics, texture, and chromaticity. Furthermore, we introduce the *Perceptogram*—an interpretable visualization tool that probes the hierarchical structure of visual perception in the human brain.
📝 Abstract
Visual neural decoding from EEG has improved significantly due to diffusion models that can reconstruct high-quality images from decoded latents. While recent works have focused on relatively complex architectures to achieve good reconstruction performance from EEG, less attention has been paid to the source of this information. In this work, we attempt to discover EEG features that represent perceptual and semantic visual categories, using a simple pipeline. Notably, the high temporal resolution of EEG allows us to go beyond static semantic maps as obtained from fMRI. We show (a) Training a simple linear decoder from EEG to CLIP latent space, followed by a frozen pre-trained diffusion model, is sufficient to decode images with state-of-the-art reconstruction performance. (b) Mapping the decoded latents back to EEG using a linear encoder isolates CLIP-relevant EEG spatiotemporal features. (c) By using other latent spaces representing lower-level image features, we obtain similar time-courses of texture/hue-related information. We thus use our framework, Perceptogram, to probe EEG signals at various levels of the visual information hierarchy.