Universal dimensions of visual representation

📅 2024-08-23
🏛️ arXiv.org
📈 Citations: 2
Influential: 0
📄 PDF
🤖 AI Summary
It remains unclear whether representational similarities between artificial vision models and human visual cortex arise from shared architectural priors or from universal image-processing principles. Method: Using representational similarity analysis (RSA) and fMRI response modeling, we systematically compared intermediate representations across hundreds of heterogeneous deep visual models. Contribution/Results: We discover that diverse models converge onto a compact set of fewer than ten highly generalizable representational dimensions—stable across architectures and vision tasks, and significantly better aligned with human V1–IT fMRI responses than model-specific representations. Remarkably, only eight such dimensions preserve over 90% of the model–brain representational similarity. These findings indicate that deep representational alignment between artificial and biological vision stems not from architectural idiosyncrasies but from shared, universal structural priors in image representation. This provides a new paradigm for uncovering fundamental principles of visual intelligence.

Technology Category

Application Category

📝 Abstract
Do neural network models of vision learn brain-aligned representations because they share architectural constraints and task objectives with biological vision or because they learn universal features of natural image processing? We characterized the universality of hundreds of thousands of representational dimensions from visual neural networks with varied construction. We found that networks with varied architectures and task objectives learn to represent natural images using a shared set of latent dimensions, despite appearing highly distinct at a surface level. Next, by comparing these networks with human brain representations measured with fMRI, we found that the most brain-aligned representations in neural networks are those that are universal and independent of a network's specific characteristics. Remarkably, each network can be reduced to fewer than ten of its most universal dimensions with little impact on its representational similarity to the human brain. These results suggest that the underlying similarities between artificial and biological vision are primarily governed by a core set of universal image representations that are convergently learned by diverse systems.
Problem

Research questions and friction points this paper is trying to address.

Visual Neural Network Models
Brain Vision Processing
Common Visual Understanding
Innovation

Methods, ideas, or system contributions that make the work stand out.

Visual Neural Networks
Core Image Processing Techniques
Brain-like Image Understanding
🔎 Similar Papers
No similar papers found.
Z
Zirui Chen
Department of Cognitive Science, Johns Hopkins University, Baltimore, 21218
Michael F. Bonner
Michael F. Bonner
Department of Cognitive Science, Johns Hopkins University, Baltimore, 21218