🤖 AI Summary
Downstream probing only assesses task-relevant information in representations, failing to characterize critical properties—such as equivariance, invariance, and disentanglement—that govern interpretability and generalization; moreover, existing evaluation frameworks lack standardization, modularity, and cross-modal applicability.
Method: We propose the first representation quality assessment framework that transcends downstream tasks, employing controlled factorial probe design to systematically quantify informativeness, equivariance, invariance, and disentanglement. The framework is modular, interpretable, and supports cross-modal analysis (e.g., image and speech).
Contribution/Results: It establishes the first standardized, multi-dimensional semantic attribute disentanglement protocol. Experiments reveal substantial divergence in intrinsic representation properties—even among models with comparable downstream performance—enabling fine-grained representation understanding, diagnosis, and optimization. This work introduces a novel paradigm and practical toolkit for representation evaluation beyond task-specific metrics.
📝 Abstract
Downstream probing has been the dominant method for evaluating model representations, an important process given the increasing prominence of self-supervised learning and foundation models. However, downstream probing primarily assesses the availability of task-relevant information in the model's latent space, overlooking attributes such as equivariance, invariance, and disentanglement, which contribute to the interpretability, adaptability, and utility of representations in real-world applications. While some attempts have been made to measure these qualities in representations, no unified evaluation framework with modular, generalizable, and interpretable metrics exists. In this paper, we argue for the importance of representation evaluation beyond downstream probing. We introduce a standardized protocol to quantify informativeness, equivariance, invariance, and disentanglement of factors of variation in model representations. We use it to evaluate representations from a variety of models in the image and speech domains using different architectures and pretraining approaches on identified controllable factors of variation. We find that representations from models with similar downstream performance can behave substantially differently with regard to these attributes. This hints that the respective mechanisms underlying their downstream performance are functionally different, prompting new research directions to understand and improve representations.