Learning Massive-scale Partial Correlation Networks in Clinical Multi-omics Studies with HP-ACCORD

📅 2024-12-16
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of balancing statistical consistency and computational scalability in partial correlation network inference from high-dimensional clinical multi-omics data, this paper proposes a scalable precision matrix estimation framework. Methodologically, it (1) introduces a pseudolikelihood reparameterization paradigm to preserve sparse structure invariance; (2) designs a novel ℓ₁-regularized loss function, enabling theoretically consistent network learning with provable convergence rates even for problems involving millions of variables; and (3) integrates operator-splitting optimization with communication-avoiding distributed matrix multiplication to support high-performance parallel computation. Empirically, the method robustly recovers ground-truth biological networks in million-variable simulations and accurately identifies key transcription factors and their co-activators in hepatocellular carcinoma dual-omics data, achieving significantly higher specificity than state-of-the-art methods.

Technology Category

Application Category

📝 Abstract
Graphical model estimation from modern multi-omics data requires a balance between statistical estimation performance and computational scalability. We introduce a novel pseudolikelihood-based graphical model framework that reparameterizes the target precision matrix while preserving sparsity pattern and estimates it by minimizing an $ell_1$-penalized empirical risk based on a new loss function. The proposed estimator maintains estimation and selection consistency in various metrics under high-dimensional assumptions. The associated optimization problem allows for a provably fast computation algorithm using a novel operator-splitting approach and communication-avoiding distributed matrix multiplication. A high-performance computing implementation of our framework was tested in simulated data with up to one million variables demonstrating complex dependency structures akin to biological networks. Leveraging this scalability, we estimated partial correlation network from a dual-omic liver cancer data set. The co-expression network estimated from the ultrahigh-dimensional data showed superior specificity in prioritizing key transcription factors and co-activators by excluding the impact of epigenomic regulation, demonstrating the value of computational scalability in multi-omic data analysis. %derived from the gene expression data.
Problem

Research questions and friction points this paper is trying to address.

Balancing statistical performance and computational scalability in multi-omics graphical models
Estimating high-dimensional precision matrices with sparsity and consistency guarantees
Enabling large-scale partial correlation network analysis for clinical multi-omics data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Pseudolikelihood-based graphical model reparameterization
L1-penalized empirical risk minimization
Communication-avoiding distributed matrix multiplication
🔎 Similar Papers
No similar papers found.
S
Sungdong Lee
Department of Medicine, National University of Singapore
J
Joshua Bang
Department of Statistics and Applied Probability, University of California, Santa Barbara
Youngrae Kim
Youngrae Kim
University of Southern California
Machine LearningComputer VisionDomain Adaptation
H
Hyungwon Choi
Department of Medicine, National University of Singapore
Sang-Yun Oh
Sang-Yun Oh
Department of Statistics and Applied Probability, University of California, Santa Barbara
J
Joong-Ho Won
Department of Statistics, Seoul National University