🤖 AI Summary
Mainstream dimensionality reduction methods such as UMAP and t-SNE lack a unified probabilistic interpretation, hindering theoretical understanding and systematic comparison.
Method: We propose ProbDR—a probabilistic dimensionality reduction framework that formally models these methods as maximum a posteriori (MAP) estimation problems under a graph Laplacian prior. By introducing a Wishart-distributed prior and a nonlinear covariance function, ProbDR uncovers their implicit low-variance assumption on the underlying low-dimensional manifold and establishes a rigorous theoretical connection to Gaussian process latent variable models.
Contribution/Results: (1) ProbDR provides the first unified probabilistic generative perspective encompassing multiple classical algorithms; (2) it endows reduced representations with statistically interpretable semantics; and (3) it enables principled, cross-algorithm analysis—supported by an open-source, general-purpose analytical toolkit.
📝 Abstract
This paper shows that dimensionality reduction methods such as UMAP and t-SNE, can be approximately recast as MAP inference methods corresponding to a model introduced in ProbDR, that describes the graph Laplacian (an estimate of the data precision matrix) using a Wishart distribution, with a mean given by a non-linear covariance function evaluated on the latents. This interpretation offers deeper theoretical and semantic insights into such algorithms, by showing that variances corresponding to these covariances are low (potentially misspecified), and forging a connection to Gaussian process latent variable models by showing that well-known kernels can be used to describe covariances implied by graph Laplacians. We also introduce tools with which similar dimensionality reduction methods can be studied.