2D Stability Selection: Design Jittering for Doubly Stable Feature Selection

📅 2026-05-04
📈 Citations: 0
Influential: 0
📄 PDF

career value

218K/year
🤖 AI Summary
Feature selection in high-dimensional regression is highly susceptible to the dual perturbations of sampling variability and measurement error in the design matrix. This work proposes a perturbation-and-aggregation framework that injects controlled additive noise into subsampled data and evaluates feature selection frequencies across multiple noise levels to construct stability paths, thereby identifying features robust to both types of perturbations. The method uniquely models sampling randomness and design noise simultaneously while preserving full-sample utilization and maintaining compatibility with base selectors such as Lasso. Theoretical analysis establishes model selection consistency under small perturbations, and empirical results demonstrate that the approach significantly outperforms existing methods on both synthetic and real-world datasets, exhibiting superior robustness.
📝 Abstract
We study feature selection in high-dimensional regression under two distinct sources of instability: sampling variability and measurement error in the design matrix. Stability Selection addresses the former through sub-sampling and aggregation, but does not explicitly stress-test robustness to noisy predictors. We introduce doubly stable feature selection, a perturb-and-aggregate framework that targets features whose inclusion is stable both across randomization and across increasing levels of design noise. The method injects controlled additive noise into the design matrix, fits a fixed base selector such as the Lasso on the perturbed data, and aggregates selection frequencies. Sweeping over a grid of noise levels yields a stability path that summarizes robustness to measurement error while using the full sample size and isolating the effect of design perturbations. On the theory side, we show that classical model-selection conditions are preserved under sufficiently small perturbations, with a high-probability extension for Gaussian noise. Empirically, experiments on synthetic and real datasets show improved robustness compared with Stability Selection and standard base selectors.
Problem

Research questions and friction points this paper is trying to address.

feature selection
measurement error
design matrix
stability
high-dimensional regression
Innovation

Methods, ideas, or system contributions that make the work stand out.

doubly stable feature selection
design jittering
stability path
measurement error robustness
perturb-and-aggregate
🔎 Similar Papers