Latent space configuration for improved generalization in supervised autoencoder neural networks

📅 2024-02-13

🏛️ arXiv.org

📈 Citations: 2

✨ Influential: 0

career value

189K/year

🤖 AI Summary

To address the uncontrollable topology of latent spaces (LS) in autoencoders (AEs), this paper proposes a geometric-loss-guided supervised AE co-optimization framework—the first to explicitly configure LS topology in supervised AEs. The method jointly optimizes encoder architecture and geometric constraint losses, enabling user-defined cluster positions and shapes, decoder-free label prediction, and cross-sample similarity assessment. Key innovations include zero-shot cross-dataset generalization, similarity estimation for unseen classes, and text-driven image retrieval without classifiers or language models. Experiments demonstrate 12–19% improvements in zero-shot accuracy on LIP, Market-1501, and WildTrack, and achieve 78.3% mAP in cross-modal retrieval.

Technology Category

Application Category

📝 Abstract

Autoencoders (AE) are simple yet powerful class of neural networks that compress data by projecting input into low-dimensional latent space (LS). Whereas LS is formed according to the loss function minimization during training, its properties and topology are not controlled directly. In this paper we focus on AE LS properties and propose two methods for obtaining LS with desired topology, called LS configuration. The proposed methods include loss configuration using a geometric loss term that acts directly in LS, and encoder configuration. We show that the former allows to reliably obtain LS with desired configuration by defining the positions and shapes of LS clusters for supervised AE (SAE). Knowing LS configuration allows to define similarity measure in LS to predict labels or estimate similarity for multiple inputs without using decoders or classifiers. We also show that this leads to more stable and interpretable training. We show that SAE trained for clothes texture classification using the proposed method generalizes well to unseen data from LIP, Market1501, and WildTrack datasets without fine-tuning, and even allows to evaluate similarity for unseen classes. We further illustrate the advantages of pre-configured LS similarity estimation with cross-dataset searches and text-based search using a text query without language models.

Problem

Research questions and friction points this paper is trying to address.

Configuring latent space topology for supervised autoencoders

Improving generalization to unseen datasets without fine-tuning

Enabling similarity measurement and label prediction directly in latent space

Innovation

Methods, ideas, or system contributions that make the work stand out.

Geometric loss term for latent space configuration

Encoder configuration to control topology

Similarity measure in latent space without decoders

🔎 Similar Papers

Autoencoder-based General Purpose Representation Learning for Customer Embedding