🤖 AI Summary
fMRI foundation models suffer from low data efficiency—requiring massive datasets—and poor training efficiency—due to memory explosion from voxel-wise modeling. To address these bottlenecks, we propose a graph-free, lightweight two-stage adaptive architecture. In the first stage, a lightweight temporal extractor identifies highly salient time windows; in the second stage, a 4D hierarchical JEPA encoder models only the top-k windows while masking 70% of voxels, preserving voxel-level spatial fidelity while achieving substantial computational compression. The model enables end-to-end, atlas-free pretraining and achieves state-of-the-art performance across seven public benchmarks using only 4,000 fMRI samples. It reduces GPU memory consumption to just 30% of conventional voxel-based methods. To our knowledge, this is the first fMRI foundation model to simultaneously achieve high data efficiency and high training efficiency.
📝 Abstract
Foundation models are emerging as a powerful paradigm for fMRI analysis, but current approaches face a dual bottleneck of data- and training-efficiency. Atlas-based methods aggregate voxel signals into fixed regions of interest, reducing data dimensionality but discarding fine-grained spatial details, and requiring extremely large cohorts to train effectively as general-purpose foundation models. Atlas-free methods, on the other hand, operate directly on voxel-level information - preserving spatial fidelity but are prohibitively memory- and compute-intensive, making large-scale pre-training infeasible. We introduce SLIM-Brain (Sample-efficient, Low-memory fMRI Foundation Model for Human Brain), a new atlas-free foundation model that simultaneously improves both data- and training-efficiency. SLIM-Brain adopts a two-stage adaptive design: (i) a lightweight temporal extractor captures global context across full sequences and ranks data windows by saliency, and (ii) a 4D hierarchical encoder (Hiera-JEPA) learns fine-grained voxel-level representations only from the top-$k$ selected windows, while deleting about 70% masked patches. Extensive experiments across seven public benchmarks show that SLIM-Brain establishes new state-of-the-art performance on diverse tasks, while requiring only 4 thousand pre-training sessions and approximately 30% of GPU memory comparing to traditional voxel-level methods.