Unveiling Deep Semantic Uncertainty Perception for Language-Anchored Multi-modal Vision-Brain Alignment

📅 2025-11-06

📈 Citations: 0

✨ Influential: 0

career value

213K/year

🤖 AI Summary

Neural signal decoding (EEG/MEG/fMRI) suffers from high inter-subject variability and severe entanglement of visual features; existing vision–brain alignment methods lack interpretability and robustness due to shallow semantic representations in visual embeddings. Method: We propose a language-anchored multimodal alignment framework—the first end-to-end, language-guided vision–brain alignment approach—incorporating an uncertainty-aware module and a learnable semantic matrix. It jointly enforces hierarchical semantic disentanglement, shared latent-space mapping, and uncertainty-weighted alignment, trained via a two-stage strategy: unimodal pretraining followed by multimodal fine-tuning. Contribution/Results: Our method achieves state-of-the-art performance across EEG, MEG, and fMRI benchmarks: +14.3% cross-subject retrieval accuracy on 200-class EEG classification, alongside substantial improvements in image reconstruction fidelity and semantic caption generation quality.

Technology Category

Application Category

📝 Abstract

Unveiling visual semantics from neural signals such as EEG, MEG, and fMRI remains a fundamental challenge due to subject variability and the entangled nature of visual features. Existing approaches primarily align neural activity directly with visual embeddings, but visual-only representations often fail to capture latent semantic dimensions, limiting interpretability and deep robustness. To address these limitations, we propose Bratrix, the first end-to-end framework to achieve multimodal Language-Anchored Vision-Brain alignment. Bratrix decouples visual stimuli into hierarchical visual and linguistic semantic components, and projects both visual and brain representations into a shared latent space, enabling the formation of aligned visual-language and brain-language embeddings. To emulate human-like perceptual reliability and handle noisy neural signals, Bratrix incorporates a novel uncertainty perception module that applies uncertainty-aware weighting during alignment. By leveraging learnable language-anchored semantic matrices to enhance cross-modal correlations and employing a two-stage training strategy of single-modality pretraining followed by multimodal fine-tuning, Bratrix-M improves alignment precision. Extensive experiments on EEG, MEG, and fMRI benchmarks demonstrate that Bratrix improves retrieval, reconstruction, and captioning performance compared to state-of-the-art methods, specifically surpassing 14.3% in 200-way EEG retrieval task. Code and model are available.

Problem

Research questions and friction points this paper is trying to address.

Aligning neural signals with visual semantics despite subject variability

Overcoming limitations of visual-only representations in brain decoding

Handling noisy neural data while achieving multimodal brain-vision alignment

Innovation

Methods, ideas, or system contributions that make the work stand out.

Language-anchored multimodal vision-brain alignment framework

Uncertainty perception module for noisy neural signals

Two-stage training with learnable semantic matrices

🔎 Similar Papers

Modelling Multimodal Integration in Human Concept Processing with Vision-Language Models