🤖 AI Summary
In unsupervised representation learning, embedding quality assessment has long been constrained by strong assumptions—such as linear separability or covariance structure—due to the absence of ground-truth labels. To address this, we propose *Persistence*, the first unsupervised, topology-aware evaluation metric grounded in persistent homology. It quantifies multi-scale geometric structure and topological richness of embedding spaces, enabling a unified characterization of global, nonlinear patterns without requiring labels or model-specific assumptions—thereby overcoming theoretical limitations of conventional metrics. Empirically validated across diverse domains, Persistence achieves the highest correlation with downstream task performance (average Pearson *r* = 0.89), significantly outperforming existing unsupervised evaluation methods. It effectively supports model selection and hyperparameter optimization.
📝 Abstract
Modern representation learning increasingly relies on unsupervised and self-supervised methods trained on large-scale unlabeled data. While these approaches achieve impressive generalization across tasks and domains, evaluating embedding quality without labels remains an open challenge. In this work, we propose Persistence, a topology-aware metric based on persistent homology that quantifies the geometric structure and topological richness of embedding spaces in a fully unsupervised manner. Unlike metrics that assume linear separability or rely on covariance structure, Persistence captures global and multi-scale organization. Empirical results across diverse domains show that Persistence consistently achieves top-tier correlations with downstream performance, outperforming existing unsupervised metrics and enabling reliable model and hyperparameter selection.