🤖 AI Summary
To address challenges in evaluating AI model reliability, cross-model comparability, and interchangeability between small and large models, this paper proposes a novel performance assessment paradigm based on cross-model neuron-output correlations. Unlike conventional methods, it requires no access to internal test sets and enables independent, verifiable evaluation of generalization and robustness via neuron-level similarity matching, pairwise network correlation quantification, and unsupervised cross-architecture alignment. Key contributions include: (1) the first neuron-level cross-model correlation metric; (2) an empirically validated, transferable relationship linking correlation scores to robustness; and (3) support for memory-efficient model substitution—enabling smaller models to replace larger ones without significant performance degradation. Experiments show a 32% reduction in performance prediction error compared to traditional metrics, and achieve a correlation of 0.87 between predicted and observed generalization across ImageNet and CIFAR transfer tasks. The implementation is publicly available.
📝 Abstract
As Artificial Intelligence (AI) models are increasingly integrated into critical systems, the need for a robust framework to establish the trustworthiness of AI is increasingly paramount. While collaborative efforts have established conceptual foundations for such a framework, there remains a significant gap in developing concrete, technically robust methods for assessing AI model quality and performance. A critical drawback in the traditional methods for assessing the validity and generalizability of models is their dependence on internal developer datasets, rendering it challenging to independently assess and verify their performance claims. This paper introduces a novel approach for assessing a newly trained model's performance based on another known model by calculating correlation between neural networks. The proposed method evaluates correlations by determining if, for each neuron in one network, there exists a neuron in the other network that produces similar output. This approach has implications for memory efficiency, allowing for the use of smaller networks when high correlation exists between networks of different sizes. Additionally, the method provides insights into robustness, suggesting that if two highly correlated networks are compared and one demonstrates robustness when operating in production environments, the other is likely to exhibit similar robustness. This contribution advances the technical toolkit for responsible AI, supporting more comprehensive and nuanced evaluations of AI models to ensure their safe and effective deployment. Code is available at https://github.com/aheldis/Cross-model-correlation.git.