Worse than Random: The Importance of a Baseline for Unsupervised Feature Selection

📅 2026-05-21

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

This study addresses the lack of a unified evaluation benchmark in unsupervised feature selection, which hinders objective assessment of method effectiveness. It systematically argues for the necessity of including random feature selection as a fundamental baseline and conducts a comprehensive comparison of multiple state-of-the-art unsupervised feature selection algorithms against random selection across diverse datasets, evaluating both clustering performance and computational efficiency. The experiments reveal that many advanced methods underperform random selection on key metrics, underscoring the urgency and importance of adopting this baseline. This work establishes a critical reference standard for future research in the field.

📝 Abstract

Many novel unsupervised feature selection methods are proposed each year, yet their empirical evaluation is limited to supervised and unsupervised evaluation metrics computed on selected datasets, along with comparisons to existing methods. However, in the absence of an established evaluation baseline, it is difficult to determine the value added to the existing literature by each of these methods, and how effective their underlying approaches are. We propose using random feature selection as a baseline for evaluating the unsupervised feature selection methods. We empirically show that many of the state-of-the-art methods in unsupervised feature selection are outperformed by random feature selection in both performance and efficiency. Accordingly, we emphasize on the strict requirement of considering random feature selection as a baseline in the development process of novel unsupervised feature selection methods to ensure a consistent improvement over random feature selection.

Problem

Research questions and friction points this paper is trying to address.

unsupervised feature selection

evaluation baseline

random feature selection

performance comparison

Innovation

Methods, ideas, or system contributions that make the work stand out.

unsupervised feature selection

random baseline

empirical evaluation

feature selection benchmark