Hi-DREAM: Brain Inspired Hierarchical Diffusion for fMRI Reconstruction via ROI Encoder and visuAl Mapping

📅 2025-11-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current fMRI-to-image reconstruction methods condition diffusion models directly on whole-brain fMRI features, neglecting the hierarchical processing architecture of the visual cortex and thereby conflating functional representations across early, mid-, and late visual areas. To address this, we propose a brain-inspired hierarchical diffusion framework: first, an ROI encoder disentangles fMRI signals into early/mid/late visual streams; second, a multi-scale cortical pyramid—architecturally aligned with U-Net depth—is constructed; third, a lightweight ControlNet enables depth-matched, scale-specific conditional injection. This work is the first to explicitly integrate the anatomical and functional hierarchy of the ventral visual pathway into diffusion modeling, significantly enhancing model interpretability and semantic fidelity. Evaluated on the Natural Scenes Dataset, our method achieves state-of-the-art performance on semantic metrics (e.g., CLIP Score) while preserving fine-grained low-level details, empirically validating the efficacy of cortical hierarchical modeling for neural decoding.

Technology Category

Application Category

📝 Abstract
Mapping human brain activity to natural images offers a new window into vision and cognition, yet current diffusion-based decoders face a core difficulty: most condition directly on fMRI features without analyzing how visual information is organized across the cortex. This overlooks the brain's hierarchical processing and blurs the roles of early, middle, and late visual areas. We propose Hi-DREAM, a brain-inspired conditional diffusion framework that makes the cortical organization explicit. A region-of-interest (ROI) adapter groups fMRI into early/mid/late streams and converts them into a multi-scale cortical pyramid aligned with the U-Net depth (shallow scales preserve layout and edges; deeper scales emphasize objects and semantics). A lightweight, depth-matched ControlNet injects these scale-specific hints during denoising. The result is an efficient and interpretable decoder in which each signal plays a brain-like role, allowing the model not only to reconstruct images but also to illuminate functional contributions of different visual areas. Experiments on the Natural Scenes Dataset (NSD) show that Hi-DREAM attains state-of-the-art performance on high-level semantic metrics while maintaining competitive low-level fidelity. These findings suggest that structuring conditioning by cortical hierarchy is a powerful alternative to purely data-driven embeddings and provides a useful lens for studying the visual cortex.
Problem

Research questions and friction points this paper is trying to address.

Reconstructing natural images from fMRI data using hierarchical brain processing
Addressing blurred roles of visual cortex areas in current diffusion decoders
Aligning cortical organization with diffusion model architecture for interpretable reconstruction
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical fMRI grouping via ROI encoder
Multi-scale cortical pyramid alignment with U-Net
Depth-matched ControlNet injects scale-specific hints
🔎 Similar Papers
No similar papers found.