🤖 AI Summary
High-dimensional immunophenotypic data often exhibit both severe multicollinearity and non-normal (skewed) distributions, leading to biased variable selection and unstable coefficient estimation in conventional regularized methods such as Lasso. To address this, we propose the Bootstrap-Enhanced Regularization Method (BERM), a novel two-stage collaborative correction framework. BERM first mitigates multicollinearity effects via bootstrap resampling, then jointly applies robust standardization and stepwise dimensionality reduction to correct for distributional skewness. In extensive simulations and real-world type 1 diabetes (T1D) immunophenotyping data, BERM significantly improves biomarker identification accuracy—reducing false-negative rates by 32% relative to state-of-the-art methods—and successfully validates multiple novel immune parameters with statistically significant disease associations. BERM establishes a new paradigm for stable and consistent variable selection in high-dimensional biomedical data characterized by high correlation and distributional asymmetry.
📝 Abstract
Accurate prediction and identification of variables associated with outcomes or disease states are critical for advancing diagnosis, prognosis, and precision medicine in biomedical research. Regularized regression techniques, such as lasso, are widely employed to enhance interpretability by reducing model complexity and identifying significant variables. However, when applying to biomedical datasets, e.g., immunophenotyping dataset, there are two major challenges that may lead to unsatisfactory results using these methods: 1) high correlation between predictors, which leads to the exclusion of important variables with included predictors in variable selection, and 2) the presence of skewness, which violates key statistical assumptions of these methods. Current approaches that fail to address these issues simultaneously may lead to biased interpretations and unreliable coefficient estimates. To overcome these limitations, we propose a novel two-step approach, the Bootstrap-Enhanced Regularization Method (BERM). BERM outperforms existing two-step approaches and demonstrates consistent performance in terms of variable selection and estimation accuracy across simulated sparsity scenarios. We further demonstrate the effectiveness of BERM by applying it to a human immunophenotyping dataset identifying important immune parameters associated the autoimmune disease, type 1 diabetes.