Towards One Model for Classical Dimensionality Reduction: A Probabilistic Perspective on UMAP and t-SNE

📅 2024-05-27

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

236K/year

🤖 AI Summary

Mainstream dimensionality reduction methods such as UMAP and t-SNE lack a unified probabilistic interpretation, hindering theoretical understanding and systematic comparison. Method: We propose ProbDR—a probabilistic dimensionality reduction framework that formally models these methods as maximum a posteriori (MAP) estimation problems under a graph Laplacian prior. By introducing a Wishart-distributed prior and a nonlinear covariance function, ProbDR uncovers their implicit low-variance assumption on the underlying low-dimensional manifold and establishes a rigorous theoretical connection to Gaussian process latent variable models. Contribution/Results: (1) ProbDR provides the first unified probabilistic generative perspective encompassing multiple classical algorithms; (2) it endows reduced representations with statistically interpretable semantics; and (3) it enables principled, cross-algorithm analysis—supported by an open-source, general-purpose analytical toolkit.

Technology Category

Application Category

📝 Abstract

This paper shows that dimensionality reduction methods such as UMAP and t-SNE, can be approximately recast as MAP inference methods corresponding to a model introduced in ProbDR, that describes the graph Laplacian (an estimate of the data precision matrix) using a Wishart distribution, with a mean given by a non-linear covariance function evaluated on the latents. This interpretation offers deeper theoretical and semantic insights into such algorithms, by showing that variances corresponding to these covariances are low (potentially misspecified), and forging a connection to Gaussian process latent variable models by showing that well-known kernels can be used to describe covariances implied by graph Laplacians. We also introduce tools with which similar dimensionality reduction methods can be studied.

Problem

Research questions and friction points this paper is trying to address.

Reinterpret UMAP and t-SNE as MAP inference methods.

Link dimensionality reduction to Gaussian process latent models.

Introduce tools for studying similar dimensionality reduction techniques.

Innovation

Methods, ideas, or system contributions that make the work stand out.

UMAP and t-SNE as MAP inference methods

Wishart distribution models graph Laplacian

Connects to Gaussian process latent models

🔎 Similar Papers

HUMAP: Hierarchical Uniform Manifold Approximation and Projection