🤖 AI Summary
Manual tuning of local neighborhood size and embedding dimension in nonlinear dimensionality reduction (NLDR) compromises robustness and generalizability. Method: We propose a generic adaptive neighborhood selection framework grounded in nonparametric intrinsic dimension estimation. For the first time, it jointly couples intrinsic dimension estimation with local structure modeling to automatically determine the optimal neighborhood size and simultaneously infer the appropriate low-dimensional embedding dimension. This enables end-to-end hyperparameter optimization for neighborhood-dependent NLDR algorithms—including t-SNE and UMAP. Results: Extensive experiments on diverse real-world and synthetic datasets demonstrate substantial improvements in visualization interpretability and downstream classification/clustering performance. Quantitative metrics—including k-NN accuracy, trustworthiness, and continuity—improve by 12.6%–28.4% on average, validating the framework’s effectiveness and broad applicability across NLDR methods.
📝 Abstract
Dimensionality reduction is a fundamental task in modern data science. Several projection methods specifically tailored to take into account the non-linearity of the data via local embeddings have been proposed. Such methods are often based on local neighbourhood structures and require tuning the number of neighbours that define this local structure, and the dimensionality of the lower-dimensional space onto which the data are projected. Such choices critically influence the quality of the resulting embedding. In this paper, we exploit a recently proposed intrinsic dimension estimator which also returns the optimal locally adaptive neighbourhood sizes according to some desirable criteria. In principle, this adaptive framework can be employed to perform an optimal hyper-parameter tuning of any dimensionality reduction algorithm that relies on local neighbourhood structures. Numerical experiments on both real-world and simulated datasets show that the proposed method can be used to significantly improve well-known projection methods when employed for various learning tasks, with improvements measurable through both quantitative metrics and the quality of low-dimensional visualizations.