Novel GPU Boruta algorithms for feature selection from high-dimensional data

📅 2026-05-10

📈 Citations: 0

✨ Influential: 0

career value

229K/year

🤖 AI Summary

This study addresses the computational inefficiency of conventional CPU-based wrapper feature selection algorithms in high-dimensional, large-scale data scenarios by introducing the first GPU-parallelized implementation of the Boruta algorithm. Two accelerated variants are proposed: Boruta-Permut, based on permutation importance, and Boruta-TreeImp, leveraging impurity-reduction importance. Both variants achieve substantial speedups while preserving selection accuracy comparable to the original Boruta method. Experimental results further reveal that impurity-based importance measures may systematically overestimate the relevance of certain features. Comprehensive evaluations on multiple public and custom datasets demonstrate the efficiency and effectiveness of the proposed GPU-accelerated approaches.

📝 Abstract

Most feature selection algorithms, especially wrapper methods, run inefficiently on CPU based platforms because of their high computational complexity. This inefficiency makes them unsuitable for processing large scale datasets. To address this challenge, the present study proposed two GPU accelerated versions of the Boruta feature selection procedure, in which Boruta-Permut relies on permutation based feature importance and Boruta-TreeImp employs importance based on impurity reduction. To evaluate these methods we conducted experiments on both a self constructed dataset and several publicly available datasets. The experimental results show that the proposed GPU accelerated algorithms greatly improve computational efficiency while preserving feature selection accuracy comparable to the original Boruta algorithm. In our analysis we also observe that the impurity reduction based version can overestimate the importance of some features. Overall these findings suggest that performing Boruta feature selection on GPUs offers an effective and cost efficient solution for large scale data analysis, which is a good deal.

Problem

Research questions and friction points this paper is trying to address.

feature selection

high-dimensional data

computational efficiency

large scale datasets

wrapper methods

Innovation

Methods, ideas, or system contributions that make the work stand out.

GPU acceleration

Boruta algorithm

feature selection