Scalable Ultra-High-Dimensional Quantile Regression with Genomic Applications

📅 2026-01-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the computational and memory bottlenecks of conventional quantile regression in ultra-high-dimensional settings (where $p \gg n$) by proposing a Feature-Splitting Proximal Point Algorithm, termed FS-QRPPA. This method introduces, for the first time, a feature-splitting strategy tailored to ultra-high-dimensional data, integrating a proximal point algorithm with parallel computing to efficiently solve penalized quantile regression problems. Leveraging variational analysis theory, the authors establish that the algorithm achieves a Q-linear convergence rate. Implemented in the R package fsQRPPA, the proposed approach demonstrates superior performance over existing methods on UK Biobank genomic data, exhibiting notable improvements in computational efficiency, estimation accuracy of regression coefficients, and coverage probability of prediction intervals.

Technology Category

Application Category

📝 Abstract
Modern datasets arising from social media, genomics, and biomedical informatics are often heterogeneous and (ultra) high-dimensional, creating substantial challenges for conventional modeling techniques. Quantile regression (QR) not only offers a flexible way to capture heterogeneous effects across the conditional distribution of an outcome, but also naturally produces prediction intervals that help quantify uncertainty in future predictions. However, classical QR methods can face serious memory and computational constraints in large-scale settings. These limitations motivate the use of parallel computing to maintain tractability. While extensive work has examined sample-splitting strategies in settings where the number of observations $n$ greatly exceeds the number of features $p$, the equally important (ultra) high-dimensional regime ($p>>n$) has been comparatively underexplored. To address this gap, we introduce a feature-splitting proximal point algorithm, FS-QRPPA, for penalized QR in high-dimensional regime. Leveraging recent developments in variational analysis, we establish a Q-linear convergence rate for FS-QRPPA and demonstrate its superior scalability in large-scale genomic applications from the UK Biobank relative to existing methods. Moreover, FS-QRPPA yields more accurate coefficient estimates and better coverage for prediction intervals than current approaches. We provide a parallel implementation in the R package fsQRPPA, making penalized QR tractable on large-scale datasets.
Problem

Research questions and friction points this paper is trying to address.

ultra-high-dimensional
quantile regression
scalability
genomic applications
computational constraints
Innovation

Methods, ideas, or system contributions that make the work stand out.

feature-splitting
quantile regression
ultra-high-dimensional
proximal point algorithm
scalability
H
Hanqing Wu
Department of Statistics, Lund University, Lund, Sweden
Jonas Wallin
Jonas Wallin
Lund University
statistics
I
I. Ionita-Laza
Department of Biostatistics, Columbia University, New York, USA