π€ AI Summary
Existing XAI methods struggle to unsupervisedly and interpretable compare semantic differences between two learned representations. This paper introduces RDXβthe first interpretable, label-free, gradient-free framework for representation difference analysis. RDX leverages canonical correlation analysis (CCA) for representation alignment and constructs difference saliency maps to localize concept-level discrepancies. It integrates heatmap visualization with concept attribution analysis, supporting mainstream architectures including CNNs and Transformers. Evaluated on ImageNet and iNaturalist subsets, RDX successfully uncovers inter-model semantic biases and latent data patterns. In controlled experiments, it accurately recovers pre-specified conceptual differences, significantly outperforming baselines such as Grad-CAM and SHAP. By bridging a critical gap in interpretable representation comparison, RDX establishes a novel paradigm for deep model diagnosis and evolutionary analysis.
π Abstract
We propose a method for discovering and visualizing the differences between two learned representations, enabling more direct and interpretable model comparisons. We validate our method, which we call Representational Differences Explanations (RDX), by using it to compare models with known conceptual differences and demonstrate that it recovers meaningful distinctions where existing explainable AI (XAI) techniques fail. Applied to state-of-the-art models on challenging subsets of the ImageNet and iNaturalist datasets, RDX reveals both insightful representational differences and subtle patterns in the data. Although comparison is a cornerstone of scientific analysis, current tools in machine learning, namely post hoc XAI methods, struggle to support model comparison effectively. Our work addresses this gap by introducing an effective and explainable tool for contrasting model representations.