Sparse Autoencoders Bridge The Deep Learning Model and The Brain

πŸ“… 2025-06-10
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Establishing a direct, task-agnostic mapping between deep visual model representations and human fMRI voxel responses remains challenging. Method: We propose SAE-BrainMapβ€”a framework that employs layer-wise sparse autoencoders (SAEs) to model activations across visual model layers and directly align them with concurrently acquired, voxel-level fMRI responses, constructing a cortical surface-projected voxel dictionary. Contribution/Results: This is the first work to achieve fine-grained correspondence between model layers and brain voxels. It reveals a dynamic information flow in ViT-B/16$_{CLIP}$: low-level features are utilized early, high-level semantics emerge early, and low-dimensional reconstructions appear late. Experiments show peak cosine similarity of 0.76 between SAE units and fMRI voxels, successfully replicating canonical ROI functional organization and enabling hierarchical, fine-grained mapping along the ventral visual pathway.

Technology Category

Application Category

πŸ“ Abstract
We present SAE-BrainMap, a novel framework that directly aligns deep learning visual model representations with voxel-level fMRI responses using sparse autoencoders (SAEs). First, we train layer-wise SAEs on model activations and compute the correlations between SAE unit activations and cortical fMRI signals elicited by the same natural image stimuli with cosine similarity, revealing strong activation correspondence (maximum similarity up to 0.76). Depending on this alignment, we construct a voxel dictionary by optimally assigning the most similar SAE feature to each voxel, demonstrating that SAE units preserve the functional structure of predefined regions of interest (ROIs) and exhibit ROI-consistent selectivity. Finally, we establish fine-grained hierarchical mapping between model layers and the human ventral visual pathway, also by projecting voxel dictionary activations onto individual cortical surfaces, we visualize the dynamic transformation of the visual information in deep learning models. It is found that ViT-B/16$_{CLIP}$ tends to utilize low-level information to generate high-level semantic information in the early layers and reconstructs the low-dimension information later. Our results establish a direct, downstream-task-free bridge between deep neural networks and human visual cortex, offering new insights into model interpretability.
Problem

Research questions and friction points this paper is trying to address.

Align deep learning model representations with fMRI responses
Map hierarchical visual pathways between models and brain
Analyze model interpretability via cortical activation patterns
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses sparse autoencoders for model-brain alignment
Constructs voxel dictionary via SAE feature matching
Maps model layers to visual pathway hierarchically
πŸ”Ž Similar Papers
No similar papers found.
Ziming Mao
Ziming Mao
UC Berkeley
Distributed SystemsBig DataAI Systems
J
Jia Xu
Beijing Institute of Technology
Z
Zeqi Zheng
Zhejiang University
H
Haofang Zheng
Beijing Institute of Technology
D
Dabin Sheng
Beijing Institute of Technology
Y
Yaochu Jin
Westlake University
G
Guoyuan Yang
Beijing Institute of Technology