🤖 AI Summary
Prior studies comparing neural representations for image classification between primate visual cortex and deep neural networks (DNNs) yield inconsistent conclusions, largely because conventional representational similarity methods assess global alignment while neglecting task-relevant decision consistency. Method: We propose Decision Variable Correlation (DVC), a novel sample-level similarity metric that quantifies strategy alignment—i.e., consistency in classification decisions—between observers (e.g., macaque V4/IT cortex and DNNs). Contribution/Results: Systematic evaluation across ImageNet-1k models, adversarially trained networks, and large-scale pretrained models reveals that inter-model DVC approximates inter-macaque DVC, whereas model–macaque DVC is significantly lower—and further decreases with higher ImageNet accuracy. Neither scaling training data nor improving robustness bridges this gap. This work provides the first evidence of a fundamental divergence in task-relevant decision representations between state-of-the-art DNNs and primate ventral visual cortex.
📝 Abstract
Previous studies have compared the brain and deep neural networks trained on image classification. Intriguingly, while some suggest that their representations are highly similar, others argued the opposite. Here, we propose a new approach to characterize the similarity of the decision strategies of two observers (models or brains) using decision variable correlation (DVC). DVC quantifies the correlation between decoded decisions on individual samples in a classification task and thus can capture task-relevant information rather than general representational alignment. We evaluate this method using monkey V4/IT recordings and models trained on image classification tasks. We find that model--model similarity is comparable to monkey--monkey similarity, whereas model--monkey similarity is consistently lower and, surprisingly, decreases with increasing ImageNet-1k performance. While adversarial training enhances robustness, it does not improve model--monkey similarity in task-relevant dimensions; however, it markedly increases model--model similarity. Similarly, pre-training on larger datasets does not improve model--monkey similarity. These results suggest a fundamental divergence between the task-relevant representations in monkey V4/IT and those learned by models trained on image classification tasks.