🤖 AI Summary
Nonlinear dimensionality reduction (NLDR) methods—such as t-SNE and UMAP—often exaggerate spurious structures and yield results highly sensitive to algorithm choice and hyperparameters, hindering reliability assessment. To address this, we propose a novel validation framework that reversibly maps NLDR embeddings back into the original high-dimensional space and employs dynamic projection pursuit (tour) to generate animated linear projections, visually diagnosing model fidelity across subspaces. This is the first approach enabling interpretable, dynamic back-projection of NLDR results into the native high-dimensional space. It facilitates cross-method and cross-parameter comparison of structural preservation, while explicitly identifying artifactual clusters and distortion regions. Experiments demonstrate that our framework substantially improves the interpretability and trustworthiness of NLDR outputs, establishing a robust, visualization-driven benchmark for evaluating high-dimensional data representations.
📝 Abstract
Nonlinear dimension reduction (NLDR) techniques such as tSNE, and UMAP provide a low-dimensional representation of high-dimensional data (pD{}) by applying a nonlinear transformation. NLDR often exaggerates random patterns. But NLDR views have an important role in data analysis because, if done well, they provide a concise visual (and conceptual) summary of pD{} distributions. The NLDR methods and hyper-parameter choices can create wildly different representations, making it difficult to decide which is best, or whether any or all are accurate or misleading. To help assess the NLDR and decide on which, if any, is the most reasonable representation of the structure(s) present in the pD{} data, we have developed an algorithm to show the gD{} NLDR model in the pD{} space, viewed with a tour, a movie of linear projections. From this, one can see if the model fits everywhere, or better in some subspaces, or completely mismatches the data. Also, we can see how different methods may have similar summaries or quirks.