🤖 AI Summary
Diffusion models for fMRI-based visual stimulus reconstruction suffer from structural distortions, texture blurriness, and chromatic inaccuracies due to insufficient low-level perceptual guidance during denoising. Method: This work pioneers a neuroscience-informed analysis of the diffusion process and introduces a bottom-up enhancement strategy driven by gradients of primary visual features—enabling joint optimization of semantic consistency and fine-grained fidelity. We further propose an output-consistency-guided prompting paradigm. Built upon Stable Diffusion, our framework integrates an fMRI-to-visual-feature decoder, a multi-scale visual cortical feature projection module, and a gradient-reweighted diffusion guidance mechanism. Results: On the Natural Scenes Dataset (NSD), our method significantly outperforms state-of-the-art approaches: qualitative evaluation demonstrates superior structural integrity and color accuracy; quantitative metrics show improved cross-trial reconstruction consistency. Ablation studies confirm the efficacy of each component.
📝 Abstract
Reconstructing visual stimuli from functional Magnetic Resonance Imaging fMRI enables fine-grained retrieval of brain activity. However, the accurate reconstruction of diverse details, including structure, background, texture, color, and more, remains challenging. The stable diffusion models inevitably result in the variability of reconstructed images, even under identical conditions. To address this challenge, we first uncover the neuroscientific perspective of diffusion methods, which primarily involve top-down creation using pre-trained knowledge from extensive image datasets, but tend to lack detail-driven bottom-up perception, leading to a loss of faithful details. In this paper, we propose NeuralDiffuser, which incorporates primary visual feature guidance to provide detailed cues in the form of gradients. This extension of the bottom-up process for diffusion models achieves both semantic coherence and detail fidelity when reconstructing visual stimuli. Furthermore, we have developed a novel guidance strategy for reconstruction tasks that ensures the consistency of repeated outputs with original images rather than with various outputs. Extensive experimental results on the Natural Senses Dataset (NSD) qualitatively and quantitatively demonstrate the advancement of NeuralDiffuser by comparing it against baseline and state-of-the-art methods horizontally, as well as conducting longitudinal ablation studies.