🤖 AI Summary
Existing fMRI-to-image decoding methods typically rely on intermediate feature spaces (e.g., image or text embeddings), obscuring the dynamic, region-specific contributions of cortical areas to the generative process. To address this, we propose NeuroAdapter—a novel framework that enables end-to-end, direct conditioning of latent diffusion models (LDMs) on fMRI signals, bypassing intermediate representations to preserve neural information fidelity. Furthermore, we introduce IBBI (Interpretable Bidirectional Brain–Image), a bidirectional interpretability framework that quantitatively characterizes the spatiotemporal modulation of generation stages by distinct brain regions through analysis of cross-attention weight distributions across diffusion timesteps. Evaluated on public fMRI datasets, our method achieves reconstruction quality competitive with state-of-the-art approaches while substantially enhancing the interpretability of neural–image correspondences. This work establishes a new paradigm for brain–computer interfaces and computational neuroscience by unifying high-fidelity neural decoding with mechanistic, process-level interpretability.
📝 Abstract
Recent work has demonstrated that complex visual stimuli can be decoded from human brain activity using deep generative models, helping brain science researchers interpret how the brain represents real-world scenes. However, most current approaches leverage mapping brain signals into intermediate image or text feature spaces before guiding the generative process, masking the effect of contributions from different brain areas on the final reconstruction output. In this work, we propose NeuroAdapter, a visual decoding framework that directly conditions a latent diffusion model on brain representations, bypassing the need for intermediate feature spaces. Our method demonstrates competitive visual reconstruction quality on public fMRI datasets compared to prior work, while providing greater transparency into how brain signals shape the generation process. To this end, we contribute an Image-Brain BI-directional interpretability framework (IBBI) which investigates cross-attention mechanisms across diffusion denoising steps to reveal how different cortical areas influence the unfolding generative trajectory. Our results highlight the potential of end-to-end brain-to-image decoding and establish a path toward interpreting diffusion models through the lens of visual neuroscience.