🤖 AI Summary
Existing brain-to-image decoding methods rely on multi-stage pipelines and preprocessing—particularly temporal compression of fMRI signals—hindering high-temporal-resolution image reconstruction. This work introduces the first single-stage, end-to-end diffusion model that directly synthesizes images from raw time-series fMRI data, eliminating dimensionality reduction and sequential processing. Key contributions include: (1) a temporally aware fMRI feature encoder that explicitly models the dynamic evolution of neural activity; and (2) a cross-modal latent-space alignment mechanism coupled with dynamic conditional generation, enabling both semantic-level image reconstruction and fine-grained temporal representation disentanglement. Evaluated on dynamic fMRI datasets, our method significantly outperforms state-of-the-art approaches—especially on semantic similarity metrics—while maintaining competitive performance on static fMRI benchmarks. Notably, it achieves the first millisecond-scale characterization of evolving image representations directly decoded from neural dynamics.
📝 Abstract
Brain-to-image decoding has been recently propelled by the progress in generative AI models and the availability of large ultra-high field functional Magnetic Resonance Imaging (fMRI). However, current approaches depend on complicated multi-stage pipelines and preprocessing steps that typically collapse the temporal dimension of brain recordings, thereby limiting time-resolved brain decoders. Here, we introduce Dynadiff (Dynamic Neural Activity Diffusion for Image Reconstruction), a new single-stage diffusion model designed for reconstructing images from dynamically evolving fMRI recordings. Our approach offers three main contributions. First, Dynadiff simplifies training as compared to existing approaches. Second, our model outperforms state-of-the-art models on time-resolved fMRI signals, especially on high-level semantic image reconstruction metrics, while remaining competitive on preprocessed fMRI data that collapse time. Third, this approach allows a precise characterization of the evolution of image representations in brain activity. Overall, this work lays the foundation for time-resolved brain-to-image decoding.