Novel GPU Boruta algorithms for feature selection from high-dimensional data

📅 2026-05-10
📈 Citations: 0
Influential: 0
📄 PDF

career value

240K/year
🤖 AI Summary
This study addresses the computational inefficiency of conventional CPU-based wrapper feature selection algorithms in high-dimensional, large-scale data scenarios by introducing the first GPU-parallelized implementation of the Boruta algorithm. Two accelerated variants are proposed: Boruta-Permut, based on permutation importance, and Boruta-TreeImp, leveraging impurity-reduction importance. Both variants achieve substantial speedups while preserving selection accuracy comparable to the original Boruta method. Experimental results further reveal that impurity-based importance measures may systematically overestimate the relevance of certain features. Comprehensive evaluations on multiple public and custom datasets demonstrate the efficiency and effectiveness of the proposed GPU-accelerated approaches.
📝 Abstract
Most feature selection algorithms, especially wrapper methods, run inefficiently on CPU based platforms because of their high computational complexity. This inefficiency makes them unsuitable for processing large scale datasets. To address this challenge, the present study proposed two GPU accelerated versions of the Boruta feature selection procedure, in which Boruta-Permut relies on permutation based feature importance and Boruta-TreeImp employs importance based on impurity reduction. To evaluate these methods we conducted experiments on both a self constructed dataset and several publicly available datasets. The experimental results show that the proposed GPU accelerated algorithms greatly improve computational efficiency while preserving feature selection accuracy comparable to the original Boruta algorithm. In our analysis we also observe that the impurity reduction based version can overestimate the importance of some features. Overall these findings suggest that performing Boruta feature selection on GPUs offers an effective and cost efficient solution for large scale data analysis, which is a good deal.
Problem

Research questions and friction points this paper is trying to address.

feature selection
high-dimensional data
computational efficiency
large scale datasets
wrapper methods
Innovation

Methods, ideas, or system contributions that make the work stand out.

GPU acceleration
Boruta algorithm
feature selection
high-dimensional data
impurity reduction
🔎 Similar Papers
No similar papers found.
X
Xurui Li
Department of Chemical Engineering, Tsinghua University, Shuangqing Road, Beijing, 100084, China
Z
Zhiguo Gan
Department of Chemical Engineering, Tsinghua University, Shuangqing Road, Beijing, 100084, China
J
Jiaming Zhang
Department of Chemical Engineering, Tsinghua University, Shuangqing Road, Beijing, 100084, China
Zheng Liu
Zheng Liu
Wuhan University, China
Single-Molecule BiophysicsMechanobiology
D
Diannan Lu
Department of Chemical Engineering, Tsinghua University, Shuangqing Road, Beijing, 100084, China