Decoding Visual Neural Representations by Multimodal with Dynamic Balancing

📅 2025-09-03

📈 Citations: 0

✨ Influential: 0

career value

195K/year

🤖 AI Summary

To address the challenge of decoding visual neural representations from low signal-to-noise ratio (SNR) EEG signals, this paper proposes a multimodal semantic-enhanced decoding framework. Methodologically: (1) it constructs a text-semantic-guided shared multimodal embedding space to align EEG, image, and text modalities; (2) it introduces a modality-consistency dynamic balancing strategy to adaptively weight each modality; and (3) it incorporates stochastic perturbation regularization and dynamic Gaussian noise, while employing adapters to fuse pretrained vision and language features for improved robustness. Evaluated on the ThingsEEG dataset, the framework achieves absolute improvements of 2.0% and 4.7% in Top-1 and Top-5 accuracy, respectively, surpassing state-of-the-art methods. Its core contribution lies in being the first to dynamically leverage textual semantics as prior knowledge to enhance EEG-based visual decoding, enabling noise-robust and modality-adaptive multimodal representation learning.

Technology Category

Application Category

📝 Abstract

In this work, we propose an innovative framework that integrates EEG, image, and text data, aiming to decode visual neural representations from low signal-to-noise ratio EEG signals. Specifically, we introduce text modality to enhance the semantic correspondence between EEG signals and visual content. With the explicit semantic labels provided by text, image and EEG features of the same category can be more closely aligned with the corresponding text representations in a shared multimodal space. To fully utilize pre-trained visual and textual representations, we propose an adapter module that alleviates the instability of high-dimensional representation while facilitating the alignment and fusion of cross-modal features. Additionally, to alleviate the imbalance in multimodal feature contributions introduced by the textual representations, we propose a Modal Consistency Dynamic Balance (MCDB) strategy that dynamically adjusts the contribution weights of each modality. We further propose a stochastic perturbation regularization (SPR) term to enhance the generalization ability of semantic perturbation-based models by introducing dynamic Gaussian noise in the modality optimization process. The evaluation results on the ThingsEEG dataset show that our method surpasses previous state-of-the-art methods in both Top-1 and Top-5 accuracy metrics, improving by 2.0% and 4.7% respectively.

Problem

Research questions and friction points this paper is trying to address.

Decoding visual neural representations from low SNR EEG signals

Enhancing semantic correspondence between EEG and visual content

Alleviating multimodal feature imbalance and instability

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal integration of EEG, image, and text data

Adapter module for cross-modal feature alignment

Dynamic balance strategy adjusting modality contribution weights

🔎 Similar Papers

No similar papers found.