HeFS: Helper-Enhanced Feature Selection via Pareto-Optimized Genetic Search

📅 2025-10-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Feature selection in high-dimensional data faces NP-hardness, premature convergence, and difficulty modeling complex inter-feature dependencies. To address these challenges, we propose Helper-Enhanced Feature Selection (HeFS), a novel framework that jointly optimizes the original feature subset and a complementary set of “Helper” features within a residual feature space, using a genetic algorithm. HeFS incorporates bias-aware initialization, ratio-guided mutation, and Pareto-based multi-objective optimization to simultaneously maximize classification accuracy and feature complementarity. Evaluated on 18 benchmark datasets, HeFS significantly outperforms state-of-the-art methods. Its efficacy is further validated on real-world applications—including gastric cancer classification and drug toxicity prediction—demonstrating robust generalizability. All source code and datasets are publicly available.

Technology Category

Application Category

📝 Abstract
Feature selection is a combinatorial optimization problem that is NP-hard. Conventional approaches often employ heuristic or greedy strategies, which are prone to premature convergence and may fail to capture subtle yet informative features. This limitation becomes especially critical in high-dimensional datasets, where complex and interdependent feature relationships prevail. We introduce the HeFS (Helper-Enhanced Feature Selection) framework to refine feature subsets produced by existing algorithms. HeFS systematically searches the residual feature space to identify a Helper Set - features that complement the original subset and improve classification performance. The approach employs a biased initialization scheme and a ratio-guided mutation mechanism within a genetic algorithm, coupled with Pareto-based multi-objective optimization to jointly maximize predictive accuracy and feature complementarity. Experiments on 18 benchmark datasets demonstrate that HeFS consistently identifies overlooked yet informative features and achieves superior performance over state-of-the-art methods, including in challenging domains such as gastric cancer classification, drug toxicity prediction, and computer science applications. The code and datasets are available at https://healthinformaticslab.org/supp/.
Problem

Research questions and friction points this paper is trying to address.

Refining feature subsets to improve classification performance
Overcoming premature convergence in high-dimensional feature selection
Identifying complementary features using multi-objective genetic algorithms
Innovation

Methods, ideas, or system contributions that make the work stand out.

Helper Set identifies complementary features via genetic search
Biased initialization and ratio-guided mutation enhance optimization
Pareto-based multi-objective optimization balances accuracy and complementarity
🔎 Similar Papers
No similar papers found.
Y
Yusi Fan
College of Computer Science and Technology, Jilin University, Changchun, China, 130012
T
Tian Wang
Department of Chemical and Biological Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong, China, 999077
Z
Zhiying Yan
Department of Oncology, The Second People's Hospital of Changzhou, the Third Affiliated Hospital of Nanjing Medical University, Changzhou, China, 213003
C
Chang Liu
Beijing Life Science Academy, Beijing, China, 102209
Q
Qiong Zhou
College of Computer Science and Technology, Jilin University, Changchun, China, 130012
Q
Qi Lu
College of Computer Science and Technology, Jilin University, Changchun, China, 130012
Zhehao Guo
Zhehao Guo
University of Pittsburgh
Z
Ziqi Deng
School of Science, The Hong Kong University of Science and Technology, Hong Kong, China, 999077
Wenyu Zhu
Wenyu Zhu
Department of Oncology, The Second People's Hospital of Changzhou, the Third Affiliated Hospital of Nanjing Medical University, Changzhou, China, 213003
Ruochi Zhang
Ruochi Zhang
College of Computer Science and Technology, Jilin University, Changchun, China, 130012
Fengfeng Zhou
Fengfeng Zhou
Bioinformatics, Data Analytics
Big datafeature engineering and selectionhealth informaticsbioinformaticsdata mining