A more efficient method for large-sample model-free feature screening via multi-armed bandits

📅 2025-09-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the computational bottleneck of model-agnostic feature screening for ultra-large-scale, ultra-high-dimensional data, this paper proposes CR-SIS—a rank-correlation-based screening method—and its efficient online variant, BanditCR-SIS. Innovatively integrating the multi-armed bandit framework into feature ranking, the approach combines Chatterjee’s rank correlation coefficient, randomized sampling, and dynamic priority scheduling. It guarantees theoretical convergence and stability while reducing time complexity from $O(n log n cdot p)$ to $O(sqrt{n} log n cdot p + n log n)$. Experiments on synthetic and real-world datasets demonstrate that the method achieves high screening accuracy with minimal computational overhead. Notably, it enables the first scalable, online feature screening for high-dimensional data, establishing a novel, extensible paradigm for detecting nonlinear relationships in massive datasets.

Technology Category

Application Category

📝 Abstract
We consider the model-free feature screening in large-scale ultrahigh-dimensional data analysis. Existing feature screening methods often face substantial computational challenges when dealing with large sample sizes. To alleviate the computational burden, we propose a rank-based model-free sure independence screening method (CR-SIS) and its efficient variant, BanditCR-SIS. The CR-SIS method, based on Chatterjee's rank correlation, is as straightforward to implement as the sure independence screening (SIS) method based on Pearson correlation introduced by Fan and Lv(2008), but it is significantly more powerful in detecting nonlinear relationships between variables. Motivated by the multi-armed bandit (MAB) problem, we reformulate the feature screening procedure to significantly reduce the computational complexity of CR-SIS. For a predictor matrix of size n imes p, the computational cost of CR-SIS is O(nlog(n)p), while BanditCR-SIS reduces this to O(sqrt(n)log(n)p + nlog(n)). Theoretically, we establish the sure screening property for both CR-SIS and BanditCR-SIS under mild regularity conditions. Furthermore, we demonstrate the effectiveness of our methods through extensive experimental studies on both synthetic and real-world datasets. The results highlight their superior performance compared to classical screening methods, requiring significantly less computational time.
Problem

Research questions and friction points this paper is trying to address.

Efficient model-free feature screening for ultrahigh-dimensional data
Reducing computational complexity in large-sample nonlinear relationship detection
Overcoming computational challenges of existing feature screening methods
Innovation

Methods, ideas, or system contributions that make the work stand out.

Rank-based model-free screening via Chatterjee correlation
Multi-armed bandit reformulation reduces computational complexity
Achieves O(√n log(n)p + n log(n)) time complexity
🔎 Similar Papers
No similar papers found.
X
Xiaxue Ouyang
Institute of Statistics and Big Data, Renmin University of China
X
Xinlai Kang
Institute of Statistics and Big Data, Renmin University of China
M
Mengyu Li
Institute of Statistics and Big Data, Renmin University of China
Z
Zhenxing Dou
School of Integrated Circuit Science and Engineering, Beihang University
J
Jun Yu
School of Mathematics and Statistics, Beijing Institute of Technology
Cheng Meng
Cheng Meng
Institute of Statistics and Big Data, Renmin University of China
Data ScienceOptimal transportSubsamplingSmoothing Spline