🤖 AI Summary
In high-dimensional unsupervised clustering, the absence of ground-truth labels hinders effective identification of discriminative features. To address this, we propose an unsupervised feature selection method that jointly optimizes pseudo-labels and discriminative features. Our approach is the first to integrate local manifold structure—preserving neighborhood geometry—with global self-representation correlations—capturing intrinsic sample relationships. By incorporating self-representation regularization, sparsity constraints, and iterative optimization, it simultaneously learns robust pseudo-labels and interpretable feature subsets, seamlessly embedded within a spectral clustering framework. Compared with state-of-the-art methods, our approach achieves significant improvements in both clustering accuracy and feature selection fidelity. Extensive experiments on synthetic and multiple real-world high-dimensional datasets demonstrate superior performance, establishing new state-of-the-art results while ensuring strong robustness and interpretability.
📝 Abstract
It is important to identify the discriminative features for high dimensional clustering. However, due to the lack of cluster labels, the regularization methods developed for supervised feature selection can not be directly applied. To learn the pseudo labels and select the discriminative features simultaneously, we propose a new unsupervised feature selection method, named GlObal and Local information combined Feature Selection (GOLFS), for high dimensional clustering problems. The GOLFS algorithm combines both local geometric structure via manifold learning and global correlation structure of samples via regularized self-representation to select the discriminative features. The combination improves the accuracy of both feature selection and clustering by exploiting more comprehensive information. In addition, an iterative algorithm is proposed to solve the optimization problem and the convergency is proved. Simulations and two real data applications demonstrate the excellent finite-sample performance of GOLFS on both feature selection and clustering.