🤖 AI Summary
This work addresses the geometric incompatibility of latent representation spaces across neural networks—arising from differences in architecture, training protocols, or data—which impedes cross-model semantic alignment. To this end, the authors introduce SEMASIA, a large-scale benchmark dataset comprising embeddings from approximately 1,700 pretrained vision models evaluated on eight image classification benchmarks, along with rich structured metadata. Leveraging supervised alignment, reconstruction error analysis, downstream task evaluation, and regression modeling, the study reveals consistent prototype clustering and hierarchical semantic structures across diverse models, establishes a standardized benchmark for alignment methods, and quantifies the influence of pretraining data composition and model scale on embedding geometry.
📝 Abstract
Latent representations learned by neural networks often exhibit semantic structure, where concept similarity is reflected by geometric proximity in embedding space. However, comparing such spaces across models remains difficult: changes in architecture, pretraining data, objective, or random seed can yield embeddings with similar content but incompatible geometry. This latent space alignment problem is central to interpretability, transfer and multimodal learning, federated systems, and semantic communication; however, progress remains limited by the lack of large-scale, model-diverse, and metadata-rich benchmarks. To address this gap, we introduce SEMASIA, a large-scale collection of latent representations extracted from approximately 1,700 pretrained vision models across eight standard image-classification benchmarks. SEMASIA pairs embeddings with structured metadata describing architectures, training regimes, pretraining sources, and model scale. We demonstrate three applications of the resource. First, we analyze the conceptual organization of individual latent spaces, showing consistent prototype-like clustering and hierarchical semantic neighborhoods across models and datasets. Second, we benchmark supervised alignment mappings between latent spaces using reconstruction error and downstream task performance. Third, we perform a large-scale regression analysis of how pretraining-data complexity, specialization, transfer learning, augmentation, and model scale relate to geometric and probing properties of embeddings. By coupling representational scale with standardized metadata, SEMASIA provides a reproducible foundation for studying latent geometry, evaluating alignment methods, and developing next-generation heterogeneous and interoperable AI systems.