Penalized Linear Models for Highly Correlated High-Dimensional Immunophenotyping Data

📅 2025-04-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
High-dimensional immunophenotypic data often exhibit both severe multicollinearity and non-normal (skewed) distributions, leading to biased variable selection and unstable coefficient estimation in conventional regularized methods such as Lasso. To address this, we propose the Bootstrap-Enhanced Regularization Method (BERM), a novel two-stage collaborative correction framework. BERM first mitigates multicollinearity effects via bootstrap resampling, then jointly applies robust standardization and stepwise dimensionality reduction to correct for distributional skewness. In extensive simulations and real-world type 1 diabetes (T1D) immunophenotyping data, BERM significantly improves biomarker identification accuracy—reducing false-negative rates by 32% relative to state-of-the-art methods—and successfully validates multiple novel immune parameters with statistically significant disease associations. BERM establishes a new paradigm for stable and consistent variable selection in high-dimensional biomedical data characterized by high correlation and distributional asymmetry.

Technology Category

Application Category

📝 Abstract
Accurate prediction and identification of variables associated with outcomes or disease states are critical for advancing diagnosis, prognosis, and precision medicine in biomedical research. Regularized regression techniques, such as lasso, are widely employed to enhance interpretability by reducing model complexity and identifying significant variables. However, when applying to biomedical datasets, e.g., immunophenotyping dataset, there are two major challenges that may lead to unsatisfactory results using these methods: 1) high correlation between predictors, which leads to the exclusion of important variables with included predictors in variable selection, and 2) the presence of skewness, which violates key statistical assumptions of these methods. Current approaches that fail to address these issues simultaneously may lead to biased interpretations and unreliable coefficient estimates. To overcome these limitations, we propose a novel two-step approach, the Bootstrap-Enhanced Regularization Method (BERM). BERM outperforms existing two-step approaches and demonstrates consistent performance in terms of variable selection and estimation accuracy across simulated sparsity scenarios. We further demonstrate the effectiveness of BERM by applying it to a human immunophenotyping dataset identifying important immune parameters associated the autoimmune disease, type 1 diabetes.
Problem

Research questions and friction points this paper is trying to address.

High correlation among predictors excludes important variables
Skewness in data violates statistical assumptions of methods
Existing approaches fail to address both issues simultaneously
Innovation

Methods, ideas, or system contributions that make the work stand out.

Bootstrap-Enhanced Regularization Method (BERM)
Handles high correlation and skewness
Improves variable selection accuracy
🔎 Similar Papers
No similar papers found.
X
Xiaoru Dong
Department of Biostatistics, College of Public Health and Health Professions, University of Florida, Gainesville, FL 32610, USA.
A
Apoorva Goyal
Department of Biostatistics, College of Public Health and Health Professions, University of Florida, Gainesville, FL 32610, USA.; Diabetes Institute, University of Florida, Gainesville, FL 32610, USA.
Muxuan Liang
Muxuan Liang
MD Anderson Cancer Center
Precision MedicineMachine LearningBiostatistics
M
Maigan A. Brusko
Diabetes Institute, University of Florida, Gainesville, FL 32610, USA.; Department of Pathology, Immunology, and Laboratory Medicine, College of Medicine, University of Florida, Gainesville, FL 32610, USA.
T
Todd M. Brusko
Diabetes Institute, University of Florida, Gainesville, FL 32610, USA.; Department of Pathology, Immunology, and Laboratory Medicine, College of Medicine, University of Florida, Gainesville, FL 32610, USA.
R
Rhonda Bacher
Department of Biostatistics, College of Public Health and Health Professions, University of Florida, Gainesville, FL 32610, USA.; Diabetes Institute, University of Florida, Gainesville, FL 32610, USA.