🤖 AI Summary
This study addresses the lack of a unified evaluation benchmark in unsupervised feature selection, which hinders objective assessment of method effectiveness. It systematically argues for the necessity of including random feature selection as a fundamental baseline and conducts a comprehensive comparison of multiple state-of-the-art unsupervised feature selection algorithms against random selection across diverse datasets, evaluating both clustering performance and computational efficiency. The experiments reveal that many advanced methods underperform random selection on key metrics, underscoring the urgency and importance of adopting this baseline. This work establishes a critical reference standard for future research in the field.
📝 Abstract
Many novel unsupervised feature selection methods are proposed each year, yet their empirical evaluation is limited to supervised and unsupervised evaluation metrics computed on selected datasets, along with comparisons to existing methods. However, in the absence of an established evaluation baseline, it is difficult to determine the value added to the existing literature by each of these methods, and how effective their underlying approaches are. We propose using random feature selection as a baseline for evaluating the unsupervised feature selection methods. We empirically show that many of the state-of-the-art methods in unsupervised feature selection are outperformed by random feature selection in both performance and efficiency. Accordingly, we emphasize on the strict requirement of considering random feature selection as a baseline in the development process of novel unsupervised feature selection methods to ensure a consistent improvement over random feature selection.