🤖 AI Summary
Conventional model-agnostic feature selection methods neglect nonlinear interdependencies among features, leading to inferior performance compared to model-based approaches. Method: This paper proposes EasyFS—a framework that preserves model independence and computational efficiency while pioneering the application of coding rate reduction theory to quantify feature redundancy. It introduces a stochastic nonlinear projection network to enable flexible, high-order feature space transformations in an unsupervised setting. Contribution/Results: Evaluated on 21 benchmark datasets, EasyFS achieves substantial improvements: average regression error reduced by 10.9%, classification accuracy increased by 5.7%, and computational time decreased by over 94%. By unifying theoretical rigor with practical efficiency, EasyFS establishes a novel paradigm for model-agnostic feature selection.
📝 Abstract
Traditional model-free feature selection methods treat each feature independently while disregarding the interrelationships among features, which leads to relatively poor performance compared with the model-aware methods. To address this challenge, we propose an efficient model-free feature selection framework via elastic expansion and compression of the features, namely EasyFS, to achieve better performance than state-of-the-art model-aware methods while sharing the characters of efficiency and flexibility with the existing model-free methods. In particular, EasyFS expands the feature space by using the random non-linear projection network to achieve the non-linear combinations of the original features, so as to model the interrelationships among the features and discover most correlated features. Meanwhile, a novel redundancy measurement based on the change of coding rate is proposed for efficient filtering of redundant features. Comprehensive experiments on 21 different datasets show that EasyFS outperforms state-of-the art methods up to 10.9% in the regression tasks and 5.7% in the classification tasks while saving more than 94% of the time.