🤖 AI Summary
To address the high trial-and-error cost and computational overhead in selecting dimensionality reduction techniques and tuning their hyperparameters, this paper proposes a dataset-adaptive, structural-complexity-driven optimization framework. Our method introduces, for the first time, a formal definition and quantitative measure of intrinsic data structural complexity, derived from projection error analysis and manifold geometric modeling—enabling *a priori* assessment of dimensionality reduction efficacy. This complexity metric guides automated algorithm selection (e.g., PCA, t-SNE, UMAP) and hyperparameter configuration, eliminating futile trials. Experiments across multiple benchmark datasets demonstrate that the proposed metric accurately approximates ground-truth data complexity; it reduces hyperparameter search time by 72% on average while preserving the fidelity of reduced-dimensional representations.
📝 Abstract
Selecting the appropriate dimensionality reduction (DR) technique and determining its optimal hyperparameter settings that maximize the accuracy of the output projections typically involves extensive trial and error, often resulting in unnecessary computational overhead. To address this challenge, we propose a dataset-adaptive approach to DR optimization guided by structural complexity metrics. These metrics quantify the intrinsic complexity of a dataset, predicting whether higher-dimensional spaces are necessary to represent it accurately. Since complex datasets are often inaccurately represented in two-dimensional projections, leveraging these metrics enables us to predict the maximum achievable accuracy of DR techniques for a given dataset, eliminating redundant trials in optimizing DR. We introduce the design and theoretical foundations of these structural complexity metrics. We quantitatively verify that our metrics effectively approximate the ground truth complexity of datasets and confirm their suitability for guiding dataset-adaptive DR workflow. Finally, we empirically show that our dataset-adaptive workflow significantly enhances the efficiency of DR optimization without compromising accuracy.