🤖 AI Summary
This study addresses the challenge of comparing high-dimensional neural representations across neuroscience and artificial intelligence: specifically, how to select similarity measures that best reveal functional correspondences and divergences. We systematically evaluate eight mainstream representational similarity metrics—including linear CKA, Procrustes distance, CCA, inner-product kernel, and nearest-neighbor alignment—against behavioral functional alignment (e.g., recognition accuracy, generalization, robustness) as a ground-truth benchmark. Our evaluation spans both biological neural data and artificial neural network models. Results show that geometry-sensitive metrics—particularly linear CKA and Procrustes distance—consistently outperform predictive metrics, achieving superior alignment with human behavioral performance and effectively distinguishing trained versus untrained models. In contrast, linear predictivity exhibits only moderate behavioral alignment. This work establishes the first behavior-driven representational similarity benchmark, providing a principled, cross-domain methodology for mechanistic interpretation and comparative analysis of neural computation.
📝 Abstract
Neuroscience and artificial intelligence (AI) both face the challenge of interpreting high-dimensional neural data, where the comparative analysis of such data is crucial for revealing shared mechanisms and differences between these complex systems. Despite the widespread use of representational comparisons and the abundance classes of comparison methods, a critical question remains: which metrics are most suitable for these comparisons? While some studies evaluate metrics based on their ability to differentiate models of different origins or constructions (e.g., various architectures), another approach is to assess how well they distinguish models that exhibit distinct behaviors. To investigate this, we examine the degree of alignment between various representational similarity measures and behavioral outcomes, employing group statistics and a comprehensive suite of behavioral metrics for comparison. In our evaluation of eight commonly used representational similarity metrics in the visual domain -- spanning alignment-based, Canonical Correlation Analysis (CCA)-based, inner product kernel-based, and nearest-neighbor methods -- we found that metrics like linear Centered Kernel Alignment (CKA) and Procrustes distance, which emphasize the overall geometric structure or shape of representations, excelled in differentiating trained from untrained models and aligning with behavioral measures, whereas metrics such as linear predictivity, commonly used in neuroscience, demonstrated only moderate alignment with behavior. These insights are crucial for selecting metrics that emphasize behaviorally meaningful comparisons in NeuroAI research.