SLIM-Brain: A Data- and Training-Efficient Foundation Model for fMRI Data Analysis

📅 2025-12-26

📈 Citations: 0

✨ Influential: 0

career value

247K/year

🤖 AI Summary

fMRI foundation models suffer from low data efficiency—requiring massive datasets—and poor training efficiency—due to memory explosion from voxel-wise modeling. To address these bottlenecks, we propose a graph-free, lightweight two-stage adaptive architecture. In the first stage, a lightweight temporal extractor identifies highly salient time windows; in the second stage, a 4D hierarchical JEPA encoder models only the top-k windows while masking 70% of voxels, preserving voxel-level spatial fidelity while achieving substantial computational compression. The model enables end-to-end, atlas-free pretraining and achieves state-of-the-art performance across seven public benchmarks using only 4,000 fMRI samples. It reduces GPU memory consumption to just 30% of conventional voxel-based methods. To our knowledge, this is the first fMRI foundation model to simultaneously achieve high data efficiency and high training efficiency.

Technology Category

Application Category

📝 Abstract

Foundation models are emerging as a powerful paradigm for fMRI analysis, but current approaches face a dual bottleneck of data- and training-efficiency. Atlas-based methods aggregate voxel signals into fixed regions of interest, reducing data dimensionality but discarding fine-grained spatial details, and requiring extremely large cohorts to train effectively as general-purpose foundation models. Atlas-free methods, on the other hand, operate directly on voxel-level information - preserving spatial fidelity but are prohibitively memory- and compute-intensive, making large-scale pre-training infeasible. We introduce SLIM-Brain (Sample-efficient, Low-memory fMRI Foundation Model for Human Brain), a new atlas-free foundation model that simultaneously improves both data- and training-efficiency. SLIM-Brain adopts a two-stage adaptive design: (i) a lightweight temporal extractor captures global context across full sequences and ranks data windows by saliency, and (ii) a 4D hierarchical encoder (Hiera-JEPA) learns fine-grained voxel-level representations only from the top-$k$ selected windows, while deleting about 70% masked patches. Extensive experiments across seven public benchmarks show that SLIM-Brain establishes new state-of-the-art performance on diverse tasks, while requiring only 4 thousand pre-training sessions and approximately 30% of GPU memory comparing to traditional voxel-level methods.

Problem

Research questions and friction points this paper is trying to address.

Addresses data- and training-efficiency bottlenecks in fMRI foundation models

Resolves atlas-based loss of spatial details and atlas-free computational intensity

Enables efficient voxel-level analysis with reduced memory and pre-training data

Innovation

Methods, ideas, or system contributions that make the work stand out.

Lightweight temporal extractor selects salient data windows

4D hierarchical encoder learns from top-k windows efficiently

Deletes 70% masked patches to reduce memory usage

🔎 Similar Papers

No similar papers found.