🤖 AI Summary
This study addresses the search bias and performance variations in multi-objective unsupervised feature selection arising from differences in objective function design, regularization direction for subset size, and initialization strategies. Through systematic evaluation on synthetic data, the authors investigate six objective combinations that jointly minimize or maximize subset size alongside accuracy, silhouette coefficient, or PCA reconstruction loss. The findings reveal the critical influence of the chosen objectives on the quality of the Pareto front and search dynamics. Notably, using PCA reconstruction loss as the primary objective efficiently yields compact feature subsets with strong predictive performance—comparable to methods directly optimizing supervised accuracy—whereas objectives based on the silhouette coefficient tend to converge to trivial, low-cardinality solutions with poor generalization.
📝 Abstract
Unsupervised feature selection is commonly formulated as a multiobjective optimisation problem that jointly optimises subset quality and subset size. Yet the behaviour of this formulation depends critically on the choice of evaluation objective, the direction of subset-size regularisation, and the initialisation strategy. We study these factors in a controlled setting using a synthetic dataset with known informative, redundant, and irrelevant feature types. Six formulations are compared by combining three evaluation objectives: accuracy, silhouette score, and PCA reconstruction loss with subset-size minimisation or maximisation. The results show that formulation strongly affects both search dynamics and the quality of the resulting Pareto front. Silhouette-based formulations exhibit a strong bias toward trivial low-cardinality solutions and remain weak proxies for predictive performance. In contrast, the proposed PCA loss objective produces compact subsets with test accuracy comparable to subsets obtained by directly optimising supervised accuracy. Overall, the study shows that objective design is central to effective multiobjective unsupervised feature selection.