Learning Brain Representation with Hierarchical Visual Embeddings

📅 2026-02-07

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

This work addresses the limitation of existing visual decoding approaches, which predominantly focus on high-level semantics while neglecting pixel-level details, thereby failing to fully capture the brain’s encoding mechanisms of visual information. To overcome this, the authors propose a hierarchical alignment strategy that integrates multi-scale pre-trained visual encoders, coupled with a contrastive learning objective and a newly designed Fusion Prior mechanism. This approach effectively enhances cross-modal distributional consistency between neural signals and image representations. The method achieves state-of-the-art performance in both quantitative and qualitative evaluations, significantly improving reconstruction fidelity without compromising retrieval accuracy. Notably, it represents the first effort to successfully balance semantic correctness with fine-grained detail preservation in brain-to-image reconstruction.

Technology Category

Application Category

📝 Abstract

Decoding visual representations from brain signals has attracted significant attention in both neuroscience and artificial intelligence. However, the degree to which brain signals truly encode visual information remains unclear. Current visual decoding approaches explore various brain-image alignment strategies, yet most emphasize high-level semantic features while neglecting pixel-level details, thereby limiting our understanding of the human visual system. In this paper, we propose a brain-image alignment strategy that leverages multiple pre-trained visual encoders with distinct inductive biases to capture hierarchical and multi-scale visual representations, while employing a contrastive learning objective to achieve effective alignment between brain signals and visual embeddings. Furthermore, we introduce a Fusion Prior, which learns a stable mapping on large-scale visual data and subsequently matches brain features to this pre-trained prior, thereby enhancing distributional consistency across modalities. Extensive quantitative and qualitative experiments demonstrate that our method achieves a favorable balance between retrieval accuracy and reconstruction fidelity.

Problem

Research questions and friction points this paper is trying to address.

visual decoding

brain signals

visual representation

pixel-level details

hierarchical representation

Innovation

Methods, ideas, or system contributions that make the work stand out.

hierarchical visual embeddings

brain-image alignment

contrastive learning