Scaling Vision Transformers for Functional MRI with Flat Maps

📅 2025-10-15

📈 Citations: 0

✨ Influential: 0

career value

215K/year

🤖 AI Summary

This study addresses the structural adaptation challenge in fMRI representation learning. We propose projecting 4D fMRI time-series volumetric data into 2D dynamic flattened-map videos and, for the first time, integrate Vision Transformers with a spatiotemporal masked autoencoder (MAE) framework to build an unsupervised foundation model tailored for fMRI. Trained on the thousand-subject Human Connectome Project (HCP) dataset, we discover that fMRI masked modeling performance strictly follows a power-law scaling law with respect to data volume—revealing its intrinsic scalability. The model enables fine-grained cross-subject brain state decoding (e.g., task condition identification) and individual trait decoding (e.g., cognitive score prediction), achieving state-of-the-art performance on downstream classification tasks. Code, pretrained models, and preprocessing pipelines are publicly released, establishing a new paradigm and reproducible benchmark for fMRI foundation model research.

Technology Category

Application Category

📝 Abstract

A key question for adapting modern deep learning architectures to functional MRI (fMRI) is how to represent the data for model input. To bridge the modality gap between fMRI and natural images, we transform the 4D volumetric fMRI data into videos of 2D fMRI activity flat maps. We train Vision Transformers on 2.3K hours of fMRI flat map videos from the Human Connectome Project using the spatiotemporal masked autoencoder (MAE) framework. We observe that masked fMRI modeling performance improves with dataset size according to a strict power scaling law. Downstream classification benchmarks show that our model learns rich representations supporting both fine-grained state decoding across subjects, as well as subject-specific trait decoding across changes in brain state. This work is part of an ongoing open science project to build foundation models for fMRI data. Our code and datasets are available at https://github.com/MedARC-AI/fmri-fm.

Problem

Research questions and friction points this paper is trying to address.

Representing fMRI data for deep learning model input

Bridging modality gap between fMRI and natural images

Building foundation models for fMRI data analysis

Innovation

Methods, ideas, or system contributions that make the work stand out.

Transform volumetric fMRI into 2D flat map videos

Train Vision Transformers using spatiotemporal masked autoencoder

Apply power scaling law for masked fMRI modeling performance

🔎 Similar Papers

Brain Mapping with Dense Features: Grounding Cortical Semantic Selectivity in Natural Images With Vision Transformers