Brain-IT: Image Reconstruction from fMRI via Brain-Interaction Transformer

📅 2025-10-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing fMRI image reconstruction methods suffer from low fidelity, poor cross-subject generalization, and low data efficiency. To address these limitations, we propose the brain-inspired Brain-Interaction Transformer (BIT), the first framework to model functionally similar voxel clusters as shared modeling units—enabling full parameter sharing across subjects and clusters. BIT introduces a dual-path localized feature prediction mechanism that jointly guides a diffusion model to generate semantically accurate and structurally coherent images. Furthermore, it employs a Transformer architecture to explicitly model inter-cluster interactions, enabling end-to-end joint optimization of high-level semantics and low-level details. On public benchmarks, BIT substantially outperforms state-of-the-art methods: it achieves superior visual quality, with a 12.3% reduction in FID and an 8.7% improvement in SSIM. Remarkably, BIT attains performance previously requiring 40 hours of subject-specific fMRI data using only one hour—dramatically enhancing both data efficiency and reconstruction fidelity.

Technology Category

Application Category

📝 Abstract
Reconstructing images seen by people from their fMRI brain recordings provides a non-invasive window into the human brain. Despite recent progress enabled by diffusion models, current methods often lack faithfulness to the actual seen images. We present "Brain-IT", a brain-inspired approach that addresses this challenge through a Brain Interaction Transformer (BIT), allowing effective interactions between clusters of functionally-similar brain-voxels. These functional-clusters are shared by all subjects, serving as building blocks for integrating information both within and across brains. All model components are shared by all clusters & subjects, allowing efficient training with a limited amount of data. To guide the image reconstruction, BIT predicts two complementary localized patch-level image features: (i)high-level semantic features which steer the diffusion model toward the correct semantic content of the image; and (ii)low-level structural features which help to initialize the diffusion process with the correct coarse layout of the image. BIT's design enables direct flow of information from brain-voxel clusters to localized image features. Through these principles, our method achieves image reconstructions from fMRI that faithfully reconstruct the seen images, and surpass current SotA approaches both visually and by standard objective metrics. Moreover, with only 1-hour of fMRI data from a new subject, we achieve results comparable to current methods trained on full 40-hour recordings.
Problem

Research questions and friction points this paper is trying to address.

Reconstructing visual images from fMRI brain recordings non-invasively
Improving faithfulness of reconstructed images to actual seen images
Enabling effective training with limited fMRI data from subjects
Innovation

Methods, ideas, or system contributions that make the work stand out.

Brain Interaction Transformer enables brain-voxel cluster interactions
Predicts semantic and structural features to guide diffusion model
Shared model components allow efficient training with limited data
🔎 Similar Papers
R
Roman Beliy
Department of Computer Science and Applied Mathematics, The Weizmann Institute of Science
A
Amit Zalcher
Department of Computer Science and Applied Mathematics, The Weizmann Institute of Science
J
Jonathan Kogman
Department of Computer Science and Applied Mathematics, The Weizmann Institute of Science
Navve Wasserman
Navve Wasserman
Unknown affiliation
Michal Irani
Michal Irani
Professor of Computer Science, Weizmann Institute
Computer VisionImage ProcessingVideo Information Analysis