Unveiling Deep Semantic Uncertainty Perception for Language-Anchored Multi-modal Vision-Brain Alignment

📅 2025-11-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Neural signal decoding (EEG/MEG/fMRI) suffers from high inter-subject variability and severe entanglement of visual features; existing vision–brain alignment methods lack interpretability and robustness due to shallow semantic representations in visual embeddings. Method: We propose a language-anchored multimodal alignment framework—the first end-to-end, language-guided vision–brain alignment approach—incorporating an uncertainty-aware module and a learnable semantic matrix. It jointly enforces hierarchical semantic disentanglement, shared latent-space mapping, and uncertainty-weighted alignment, trained via a two-stage strategy: unimodal pretraining followed by multimodal fine-tuning. Contribution/Results: Our method achieves state-of-the-art performance across EEG, MEG, and fMRI benchmarks: +14.3% cross-subject retrieval accuracy on 200-class EEG classification, alongside substantial improvements in image reconstruction fidelity and semantic caption generation quality.

Technology Category

Application Category

📝 Abstract
Unveiling visual semantics from neural signals such as EEG, MEG, and fMRI remains a fundamental challenge due to subject variability and the entangled nature of visual features. Existing approaches primarily align neural activity directly with visual embeddings, but visual-only representations often fail to capture latent semantic dimensions, limiting interpretability and deep robustness. To address these limitations, we propose Bratrix, the first end-to-end framework to achieve multimodal Language-Anchored Vision-Brain alignment. Bratrix decouples visual stimuli into hierarchical visual and linguistic semantic components, and projects both visual and brain representations into a shared latent space, enabling the formation of aligned visual-language and brain-language embeddings. To emulate human-like perceptual reliability and handle noisy neural signals, Bratrix incorporates a novel uncertainty perception module that applies uncertainty-aware weighting during alignment. By leveraging learnable language-anchored semantic matrices to enhance cross-modal correlations and employing a two-stage training strategy of single-modality pretraining followed by multimodal fine-tuning, Bratrix-M improves alignment precision. Extensive experiments on EEG, MEG, and fMRI benchmarks demonstrate that Bratrix improves retrieval, reconstruction, and captioning performance compared to state-of-the-art methods, specifically surpassing 14.3% in 200-way EEG retrieval task. Code and model are available.
Problem

Research questions and friction points this paper is trying to address.

Aligning neural signals with visual semantics despite subject variability
Overcoming limitations of visual-only representations in brain decoding
Handling noisy neural data while achieving multimodal brain-vision alignment
Innovation

Methods, ideas, or system contributions that make the work stand out.

Language-anchored multimodal vision-brain alignment framework
Uncertainty perception module for noisy neural signals
Two-stage training with learnable semantic matrices
🔎 Similar Papers
No similar papers found.
Z
Zehui Feng
Shanghai Jiao Tong University, Shanghai, China
C
Chenqi Zhang
Shanghai Jiao Tong University, Shanghai, China
M
Mingru Wang
Shanghai Jiao Tong University, Shanghai, China
M
Minuo Wei
Shanghai Jiao Tong University, Shanghai, China
Shiwei Cheng
Shiwei Cheng
Zhejiang University of Technology, Hangzhou, China
Cuntai Guan
Cuntai Guan
President's Chair Professor, CCDS, Nanyang Technological University
Brain-Computer InterfaceBrain-Computer InterfacesMachine LearningArtificial Intelligence
T
Ting Han
Shanghai Jiao Tong University, Shanghai, China; Zhejiang University, Hangzhou, China