Towards Interpretable Visual Decoding with Attention to Brain Representations

📅 2025-09-27

📈 Citations: 0

✨ Influential: 0

career value

215K/year

🤖 AI Summary

Existing fMRI-to-image decoding methods typically rely on intermediate feature spaces (e.g., image or text embeddings), obscuring the dynamic, region-specific contributions of cortical areas to the generative process. To address this, we propose NeuroAdapter—a novel framework that enables end-to-end, direct conditioning of latent diffusion models (LDMs) on fMRI signals, bypassing intermediate representations to preserve neural information fidelity. Furthermore, we introduce IBBI (Interpretable Bidirectional Brain–Image), a bidirectional interpretability framework that quantitatively characterizes the spatiotemporal modulation of generation stages by distinct brain regions through analysis of cross-attention weight distributions across diffusion timesteps. Evaluated on public fMRI datasets, our method achieves reconstruction quality competitive with state-of-the-art approaches while substantially enhancing the interpretability of neural–image correspondences. This work establishes a new paradigm for brain–computer interfaces and computational neuroscience by unifying high-fidelity neural decoding with mechanistic, process-level interpretability.

Technology Category

Application Category

📝 Abstract

Recent work has demonstrated that complex visual stimuli can be decoded from human brain activity using deep generative models, helping brain science researchers interpret how the brain represents real-world scenes. However, most current approaches leverage mapping brain signals into intermediate image or text feature spaces before guiding the generative process, masking the effect of contributions from different brain areas on the final reconstruction output. In this work, we propose NeuroAdapter, a visual decoding framework that directly conditions a latent diffusion model on brain representations, bypassing the need for intermediate feature spaces. Our method demonstrates competitive visual reconstruction quality on public fMRI datasets compared to prior work, while providing greater transparency into how brain signals shape the generation process. To this end, we contribute an Image-Brain BI-directional interpretability framework (IBBI) which investigates cross-attention mechanisms across diffusion denoising steps to reveal how different cortical areas influence the unfolding generative trajectory. Our results highlight the potential of end-to-end brain-to-image decoding and establish a path toward interpreting diffusion models through the lens of visual neuroscience.

Problem

Research questions and friction points this paper is trying to address.

Directly mapping brain signals to images without intermediate features

Providing transparency in how brain activity shapes image generation

Revealing cortical area influences on visual decoding process

Innovation

Methods, ideas, or system contributions that make the work stand out.

Directly conditions diffusion model on brain signals

Bypasses intermediate feature spaces for decoding

Uses cross-attention to interpret brain area contributions

🔎 Similar Papers

Brain-aligning of semantic vectors improves neural decoding of visual stimuli