HeFS: Helper-Enhanced Feature Selection via Pareto-Optimized Genetic Search

📅 2025-10-21

📈 Citations: 0

✨ Influential: 0

career value

187K/year

🤖 AI Summary

Feature selection in high-dimensional data faces NP-hardness, premature convergence, and difficulty modeling complex inter-feature dependencies. To address these challenges, we propose Helper-Enhanced Feature Selection (HeFS), a novel framework that jointly optimizes the original feature subset and a complementary set of “Helper” features within a residual feature space, using a genetic algorithm. HeFS incorporates bias-aware initialization, ratio-guided mutation, and Pareto-based multi-objective optimization to simultaneously maximize classification accuracy and feature complementarity. Evaluated on 18 benchmark datasets, HeFS significantly outperforms state-of-the-art methods. Its efficacy is further validated on real-world applications—including gastric cancer classification and drug toxicity prediction—demonstrating robust generalizability. All source code and datasets are publicly available.

Technology Category

Application Category

📝 Abstract

Feature selection is a combinatorial optimization problem that is NP-hard. Conventional approaches often employ heuristic or greedy strategies, which are prone to premature convergence and may fail to capture subtle yet informative features. This limitation becomes especially critical in high-dimensional datasets, where complex and interdependent feature relationships prevail. We introduce the HeFS (Helper-Enhanced Feature Selection) framework to refine feature subsets produced by existing algorithms. HeFS systematically searches the residual feature space to identify a Helper Set - features that complement the original subset and improve classification performance. The approach employs a biased initialization scheme and a ratio-guided mutation mechanism within a genetic algorithm, coupled with Pareto-based multi-objective optimization to jointly maximize predictive accuracy and feature complementarity. Experiments on 18 benchmark datasets demonstrate that HeFS consistently identifies overlooked yet informative features and achieves superior performance over state-of-the-art methods, including in challenging domains such as gastric cancer classification, drug toxicity prediction, and computer science applications. The code and datasets are available at https://healthinformaticslab.org/supp/.

Problem

Research questions and friction points this paper is trying to address.

Refining feature subsets to improve classification performance

Overcoming premature convergence in high-dimensional feature selection

Identifying complementary features using multi-objective genetic algorithms

Innovation

Methods, ideas, or system contributions that make the work stand out.

Helper Set identifies complementary features via genetic search

Biased initialization and ratio-guided mutation enhance optimization

Pareto-based multi-objective optimization balances accuracy and complementarity

🔎 Similar Papers

No similar papers found.