Integrated path stability selection

📅 2024-03-23
📈 Citations: 2
Influential: 0
📄 PDF
🤖 AI Summary
Stability selection theoretically controls the expected number of false positives, E(FP), but existing upper bounds on E(FP) are overly loose, resulting in low feature recall. To address this, we propose Integral Path Stability Selection (IPSS), which replaces the conventional maximum-value aggregation with path integration along the regularization path. This reformulation yields a substantially tighter—by several orders of magnitude—rigorous upper bound on E(FP), enabling significantly higher true positive rates under identical E(FP) constraints. IPSS preserves the computational efficiency and parameter simplicity of standard stability selection, requires no additional hyperparameter tuning, and seamlessly integrates with resampling-based inference as well as FDR or E(FP)-constrained optimization. Evaluated on real-world prostate and colon cancer datasets, alongside multiple simulation studies, IPSS achieves average improvements of 37–62% in true positive rate at fixed E(FP) targets, with computational cost identical to the baseline algorithm.

Technology Category

Application Category

📝 Abstract
Stability selection is a popular method for improving feature selection algorithms. One of its key attributes is that it provides theoretical upper bounds on the expected number of false positives, E(FP), enabling control of false positives in practice. However, stability selection often selects very few features, resulting in low sensitivity. This is because existing bounds on E(FP) are relatively loose, causing stability selection to overestimate the number of false positives. In this paper, we introduce a novel approach to stability selection based on integrating stability paths rather than maximizing over them. This yields upper bounds on E(FP) that are orders of magnitude stronger than previous bounds, leading to significantly more true positives in practice for the same target E(FP). Furthermore, our method takes the same amount of computation as the original stability selection algorithm, and only requires one user-specified parameter, which can be either the target E(FP) or target false discovery rate. We demonstrate the method on simulations and real data from prostate and colon cancer studies.
Problem

Research questions and friction points this paper is trying to address.

Improves feature selection via tighter false positive control
Enhances true positives by integrating stability paths
Maintains computational efficiency of original stability selection
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates stability paths for selection
Provides stronger false positive bounds
Maintains original algorithm computation
🔎 Similar Papers
No similar papers found.
Omar Melikechi
Omar Melikechi
Duke University
statisticsbiostatisticsmachine learning
J
Jeffrey W. Miller
Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA