STAMBRIDGE: Spectral-Temporal Amplitude-aware Mid-Feature Bridge for EEG Visual Decoding

📅 2026-05-21
📈 Citations: 0
Influential: 0
📄 PDF

career value

235K/year
🤖 AI Summary
This work addresses the instability in cross-modal alignment between electroencephalography (EEG) and vision-language modalities, primarily caused by EEG’s low signal-to-noise ratio and structural heterogeneity. To tackle this, the authors propose a two-stage framework: first, high-quality EEG features are extracted via Spectral-Temporal Amplitude-aware Modulation (STAM), which replaces conventional hard spectral band masking with amplitude-driven soft channel weighting; second, a model-agnostic intermediate semantic bridge (MFSB) is introduced to enable staged semantic distillation and stable alignment. Integrating multi-scale temporal convolutions with a diffusion model, the method achieves 34.50% Top-1 and 65.95% Top-5 zero-shot retrieval accuracy on the THINGS-EEG benchmark and produces semantically coherent image reconstructions.
📝 Abstract
Electroencephalography (EEG) visual decoding remains challenging due to the modality gap between low-SNR neural signals and highly structured vision--language spaces, making direct cross-modal alignment unstable. To address this, we propose STAMBRIDGE, a versatile two-stage framework that sequentially tackles feature conditioning and cross-modal alignment. First, we introduce a Spectral-Temporal Amplitude-aware Modulation (STAM) to extract well-conditioned EEG representations. By replacing hard frequency masking with amplitude-derived soft channel weighting and multi-scale temporal convolutions, STAM explicitly preserves frequency-aware transients while reducing the risk of time-domain ringing artifacts. Building upon these robust neural features, we further introduce a model-agnostic Mid-Feature Semantic Bridge (MFSB) that constructs a regularized intermediate space through directed cross-modal interactions, enabling staged distillation and more stable semantic alignment. Experiments on the THINGS-EEG benchmark show competitive 200-way zero-shot retrieval performance, with 34.50\% Top-1 and 65.95\% Top-5 accuracy. In addition, embeddings learned by STAMBRIDGE produce semantically coherent image reconstructions with a diffusion model, demonstrating robust EEG-to-vision semantic alignment. The code is available at: https://github.com/thabeatmjh/STAMBRIDGE.
Problem

Research questions and friction points this paper is trying to address.

EEG visual decoding
modality gap
cross-modal alignment
low-SNR neural signals
Innovation

Methods, ideas, or system contributions that make the work stand out.

Spectral-Temporal Amplitude-aware Modulation
Mid-Feature Semantic Bridge
EEG visual decoding
cross-modal alignment
zero-shot retrieval
🔎 Similar Papers
No similar papers found.
J
Jiahe Meng
Lab of Digital Image and Intelligent Computation, Shanghai Maritime University, Shanghai 201306, China
W
Weiming Zeng
Lab of Digital Image and Intelligent Computation, Shanghai Maritime University, Shanghai 201306, China
Yueyang Li
Yueyang Li
The Hong Kong Polytechnic University
Brain-computer InterfaceFunctional MRINeural DecodingEmotion
Bo Chai
Bo Chai
Global Energy Interconnection Research Institute
smart gridgraph database
H
Hongjie Yan
Department of Neurology, Affiliated Lianyungang Hospital of Xuzhou Medical University, Lianyungang 222002, China
Z
Zhiguo Zhang
Institute of Computing and Intelligence, Harbin Institute of Technology Shenzhen, Shenzhen 518000, China
Wai Ting Siok
Wai Ting Siok
The Hong Kong Polytechnic University
Reading developmentChinese readingDevelopmental dyslexiaNeuroimagingfMRI
Nizhuan Wang
Nizhuan Wang
The Hong Kong Polytechnic University (PolyU)
AIBrain-Computer InterfaceNeuroimagingComputational LinguisticsNeurolinguistics