Large Scale Partial Correlation Screening with Uncertainty Quantification

πŸ“… 2025-09-21
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
In ultra-high-dimensional settings (p ≫ n), existing partial correlation screening methods suffer from uncontrolled type-I error rates. Method: This paper proposes PARSECβ€”a framework that leverages the exact mathematical relationship between regression coefficients and partial correlations to achieve rigorous multiple testing error control at fixed sample size. Contribution/Results: PARSEC is the first method to provide theoretical guarantees for family-wise error rate (FWER), k-FWER, and false discovery rate (FDR)/positive FDR (pFDR) under ultra-high-dimensional asymptotics. It delivers closed-form p-values and an efficient algorithm. Theoretical analysis establishes consistency even when p grows super-exponentially with n. Extensive simulations and real-data experiments demonstrate that PARSEC uniformly outperforms state-of-the-art methods in both statistical accuracy and computational efficiency, substantially enhancing the reliability and scalability of high-dimensional dependency inference.

Technology Category

Application Category

πŸ“ Abstract
Identifying multivariate dependencies in high-dimensional data is an important problem in large-scale inference. This problem has motivated recent advances in mining (partial) correlations, which focus on the challenging ultra-high dimensional setting where the sample size, n, is fixed, while the number of features, p, grows without bound. The state-of-the-art method for partial correlation screening can lead to undesirable results. This paper introduces a novel principled framework for partial correlation screening with error control (PARSEC), which leverages the connection between partial correlations and regression coefficients. We establish the inferential properties of PARSEC when n is fixed and p grows super-exponentially. First, we provide "fixed-n-large-p" asymptotic expressions for the familywise error rate (FWER) and k-FWER. Equally importantly, our analysis leads to a novel discovery which permits the calculation of exact marginal p-values for controlling the false discovery rate (FDR), and also the positive FDR (pFDR). To our knowledge, no other competing approach in the "fixed-n large-p" setting allows for error control across the spectrum of multiple hypothesis testing metrics. We establish the computational complexity of PARSEC and rigorously demonstrate its scalability to the large p setting. The theory and methods are successfully validated on simulated and real data, and PARSEC is shown to outperform the current state-of-the-art.
Problem

Research questions and friction points this paper is trying to address.

Identifying multivariate dependencies in ultra-high dimensional data
Controlling error rates in partial correlation screening with fixed sample size
Developing scalable methods for feature screening when p grows super-exponentially
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages partial correlation and regression coefficient connection
Provides fixed-n-large-p asymptotic error rate expressions
Enables exact p-values for false discovery rate control
πŸ”Ž Similar Papers
No similar papers found.
E
Emily Neo
University of Sydney
P
Peter Radchenko
University of Sydney
Bala Rajaratnam
Bala Rajaratnam
University of California, Davis