CognitionCapturerPro: Towards High-Fidelity Visual Decoding from EEG/MEG via Multi-modal Information and Asymmetric Alignment

📅 2026-03-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the fidelity loss and representational shift inherent in reconstructing visual stimuli from EEG/MEG signals by proposing a collaborative training framework that integrates multimodal priors—specifically image, text, depth, and edge cues. The approach combines a streamlined alignment module with a pretrained diffusion model and introduces an uncertainty-weighted similarity scoring mechanism to quantify the fidelity of each modality. Furthermore, a fusion encoder is designed to integrate shared representations across modalities, enabling more precise cross-modal alignment. Evaluated on the THINGS-EEG dataset, the method achieves substantial improvements over the state-of-the-art CognitionCapturer, with Top-1 and Top-5 retrieval accuracy gains of 25.9% and 10.6%, respectively.

Technology Category

Application Category

📝 Abstract
Visual stimuli reconstruction from EEG remains challenging due to fidelity loss and representation shift. We propose CognitionCapturerPro, an enhanced framework that integrates EEG with multi-modal priors (images, text, depth, and edges) via collaborative training. Our core contributions include an uncertainty-weighted similarity scoring mechanism to quantify modality-specific fidelity and a fusion encoder for integrating shared representations. By employing a simplified alignment module and a pre-trained diffusion model, our method significantly outperforms the original CognitionCapturer on the THINGS-EEG dataset, improving Top-1 and Top-5 retrieval accuracy by 25.9% and 10.6%, respectively. Code is available at: https://github.com/XiaoZhangYES/CognitionCapturerPro.
Problem

Research questions and friction points this paper is trying to address.

visual decoding
EEG
fidelity loss
representation shift
visual reconstruction
Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-modal fusion
uncertainty-weighted alignment
EEG-based visual reconstruction
diffusion model
cross-modal representation
K
Kaifan Zhang
Visual Information Processing Laboratory, School of Electronic Engineering, Xidian University, Xi'an 710071, China
Lihuo He
Lihuo He
Professor, Xidian University
Image/Video Quality AssessmentVisual Perception
Junjie Ke
Junjie Ke
Xidian University
Object Detection
Y
Yuqi Ji
Visual Information Processing Laboratory, School of Electronic Engineering, Xidian University, Xi'an 710071, China
L
Lukun Wu
Visual Information Processing Laboratory, School of Electronic Engineering, Xidian University, Xi'an 710071, China
L
Lizi Wang
School of Artificial Intelligence, Beijing Normal University, Beijing 100875, China
X
Xinbo Gao
Visual Information Processing Laboratory, School of Electronic Engineering, Xidian University, Xi'an 710071, China