Meta-Learning an In-Context Transformer Model of Human Higher Visual Cortex

📅 2025-05-21

📈 Citations: 0

✨ Influential: 0

career value

216K/year

🤖 AI Summary

This work addresses the high data acquisition cost and poor generalization across subjects/stimuli in voxel-wise neural response prediction for higher visual cortex. We propose a fine-tuning-free few-shot prediction framework. Methodologically, we introduce the first context-intrinsic Transformer tailored for visual cortical modeling, which jointly encodes image features and voxel activations while enabling semantic mapping from natural language to neural selectivity. Our contributions are threefold: (1) the first demonstration of zero-shot, fine-tuning-free prediction using only a few fMRI samples from novel subjects and novel images; (2) consistently high prediction accuracy across subject-, stimulus-, and dataset-level shifts, as well as under varying MRI acquisition parameters; and (3) substantial performance gains over state-of-the-art voxel encoders, alongside enhanced semantic interpretability and robust generalization.

Technology Category

Application Category

📝 Abstract

Understanding functional representations within higher visual cortex is a fundamental question in computational neuroscience. While artificial neural networks pretrained on large-scale datasets exhibit striking representational alignment with human neural responses, learning image-computable models of visual cortex relies on individual-level, large-scale fMRI datasets. The necessity for expensive, time-intensive, and often impractical data acquisition limits the generalizability of encoders to new subjects and stimuli. BraInCoRL uses in-context learning to predict voxelwise neural responses from few-shot examples without any additional finetuning for novel subjects and stimuli. We leverage a transformer architecture that can flexibly condition on a variable number of in-context image stimuli, learning an inductive bias over multiple subjects. During training, we explicitly optimize the model for in-context learning. By jointly conditioning on image features and voxel activations, our model learns to directly generate better performing voxelwise models of higher visual cortex. We demonstrate that BraInCoRL consistently outperforms existing voxelwise encoder designs in a low-data regime when evaluated on entirely novel images, while also exhibiting strong test-time scaling behavior. The model also generalizes to an entirely new visual fMRI dataset, which uses different subjects and fMRI data acquisition parameters. Further, BraInCoRL facilitates better interpretability of neural signals in higher visual cortex by attending to semantically relevant stimuli. Finally, we show that our framework enables interpretable mappings from natural language queries to voxel selectivity.

Problem

Research questions and friction points this paper is trying to address.

Predict voxelwise neural responses from few-shot examples

Overcome limitations of large-scale fMRI data acquisition

Improve interpretability of neural signals in visual cortex

Innovation

Methods, ideas, or system contributions that make the work stand out.

Meta-learning in-context transformer for neural prediction

Few-shot learning without finetuning for new subjects

Joint conditioning on image features and voxel activations

🔎 Similar Papers

Brain Mapping with Dense Features: Grounding Cortical Semantic Selectivity in Natural Images With Vision Transformers

2024-10-07arXiv.orgCitations: 5