Subject-Aware Multi-Granularity Alignment for Zero-Shot EEG-to-Image Retrieval

📅 2026-04-20

📈 Citations: 0

✨ Influential: 0

career value

214K/year

🤖 AI Summary

Existing zero-shot EEG-to-image retrieval methods suffer from limited cross-subject generalization due to their neglect of individual differences in multi-granularity representations of EEG signals. This work proposes SAMGA, a novel framework that introduces, for the first time, a subject-aware multi-granularity visual supervision objective combined with a coarse-to-fine cross-modal alignment strategy. By leveraging intermediate-layer features from a pretrained visual encoder through adaptive aggregation, SAMGA simultaneously enhances semantic geometric stability and instance discriminability within a shared encoder, effectively balancing subject-specific neural response characteristics with cross-subject generalizability. Evaluated on the THINGS-EEG benchmark, the method achieves intra-subject Top-1 and Top-5 retrieval accuracies of 91.3% and 98.8%, respectively, and cross-subject accuracies of 34.4% and 64.8%, significantly outperforming current state-of-the-art approaches.

Technology Category

Application Category

📝 Abstract

Zero-shot EEG-to-image retrieval aims to decode perceived visual content from electroencephalography (EEG) by aligning neural responses with pretrained visual representations, providing a promising route toward scalable visual neural decoding and practical brain-computer interfaces. However, robust EEG-to-image retrieval remains challenging, because prior methods usually rely on either a single fixed visual target or a subject-invariant target construction scheme. Such designs overlook two important properties of visually evoked EEG signals: they preserve information across multiple representational scales, and the visual granularity best matched to EEG may vary across subjects. To address these issues, subject-aware multi-granularity alignment (SAMGA) framework is proposed for zero-shot EEG-to-image retrieval. SAMGA first constructs a subject-aware visual supervision target by adaptively aggregating multiple intermediate representations from a pretrained vision encoder, allowing the model to absorb subject-dependent granularity deviations during training while preserving subject-agnostic inference. Building on this adaptive target construction, a coarse-to-fine cross-modal alignment strategy is further designed with a shared encoder wherein the coarse stage stabilizes the shared semantic geometry and reduces subject-induced distribution shift, and the fine stage further improves instance-level retrieval discrimination. Extensive experiments on the THINGS-EEG benchmark demonstrate that the proposed method achieves 91.3% Top-1 and 98.8% Top-5 accuracy in the intra-subject setting, and 34.4% Top-1 and 64.8% Top-5 accuracy in the inter-subject setting, outperforming recent state-of-the-art methods.

Problem

Research questions and friction points this paper is trying to address.

zero-shot EEG-to-image retrieval

multi-granularity alignment

subject-aware modeling

visual neural decoding

EEG signal variability

Innovation

Methods, ideas, or system contributions that make the work stand out.

subject-aware alignment

multi-granularity representation

zero-shot EEG-to-image retrieval

cross-modal alignment

visual neural decoding

🔎 Similar Papers

Achieving more human brain-like vision via human EEG representational alignment

2024-01-30arXiv.orgCitations: 4

EEG-ImageNet: An Electroencephalogram Dataset and Benchmarks with Image Visual Stimuli of Multi-Granularity Labels

2024-06-11arXiv.orgCitations: 6