Stability Selection via Variable Decorrelation

📅 2025-05-27

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Lasso suffers from unstable variable selection under high-dimensional settings with correlated predictors. This paper proposes a two-stage “decorrelation + Lasso” framework: predictors are first decorrelated via an orthogonal transformation, followed by standard Lasso regression. Crucially, this work establishes variable decorrelation as a necessary preprocessing step for enhancing selection stability—a conceptual and methodological novelty. We theoretically prove that decorrelation ensures the post-decorrelation Lasso satisfies the Unrepresentable Condition (UIR), a key requirement for model selection consistency. Moreover, we extend the stability gains of this approach to low-dimensional regimes—another first in the literature. Extensive experiments across diverse benchmark datasets demonstrate significant improvements in both selection stability and consistency, outperforming state-of-the-art methods such as Stability Selection. An open-source R package, DVS, is released to enable plug-and-play implementation.

Technology Category

Application Category

📝 Abstract

The Lasso is a prominent algorithm for variable selection. However, its instability in the presence of correlated variables in the high-dimensional setting is well-documented. Although previous research has attempted to address this issue by modifying the Lasso loss function, this paper introduces an approach that simplifies the data processed by Lasso. We propose that decorrelating variables before applying the Lasso improves the stability of variable selection regardless of the direction of correlation among predictors. Furthermore, we highlight that the irrepresentable condition, which ensures consistency for the Lasso, is satisfied after variable decorrelation under two assumptions. In addition, by noting that the instability of the Lasso is not limited to high-dimensional settings, we demonstrate the effectiveness of the proposed approach for low-dimensional data. Finally, we present empirical results that indicate the efficacy of the proposed method across different variable selection techniques, highlighting its potential for broader application. The DVS R package is developed to facilitate the implementation of the methodology proposed in this paper.

Problem

Research questions and friction points this paper is trying to address.

Addresses Lasso instability with correlated variables

Proposes variable decorrelation before Lasso application

Demonstrates method efficacy in various data dimensions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Decorrelate variables before applying Lasso

Satisfy irrepresentable condition post-decorrelation

Effective for both high and low-dimensional data

🔎 Similar Papers

No similar papers found.

Authors to Follow