High-Dimensional Independence Testing via Maximum and Average Distance Correlations

📅 2020-01-04
📈 Citations: 8
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the challenging problem of high-dimensional multivariate independence testing. We propose a novel nonparametric test based on maximum distance correlation (MaxDCOR) and average distance correlation (AvgDCOR). The method unifies Euclidean distance and Gaussian kernel metrics, and—crucially—systematically constructs their corresponding test statistics in high dimensions for the first time. We rigorously establish statistical consistency and derive a fast chi-square approximation for the null distribution, thereby circumventing the high computational cost and limited asymptotic theory inherent in classical distance correlation. Experiments demonstrate that the proposed method achieves over 30% higher detection power than standard distance correlation under sparse strong dependence. It also significantly outperforms existing approaches in both synthetic multivariate dependency settings and real-world cancer–peptide plasma data, effectively capturing complex, high-order multivariate dependence structures.
📝 Abstract
This paper introduces and investigates the utilization of maximum and average distance correlations for multivariate independence testing. We characterize their consistency properties in high-dimensional settings with respect to the number of marginally dependent dimensions, assess the advantages of each test statistic, examine their respective null distributions, and present a fast chi-square-based testing procedure. The resulting tests are non-parametric and applicable to both Euclidean distance and the Gaussian kernel as the underlying metric. To better understand the practical use cases of the proposed tests, we evaluate the empirical performance of the maximum distance correlation, average distance correlation, and the original distance correlation across various multivariate dependence scenarios, as well as conduct a real data experiment to test the presence of various cancer types and peptide levels in human plasma.
Problem

Research questions and friction points this paper is trying to address.

Testing multivariate independence in high-dimensional settings
Comparing maximum and average distance correlations for effectiveness
Developing non-parametric tests for Euclidean and Gaussian kernel metrics
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses maximum and average distance correlations
Non-parametric tests for high-dimensional data
Fast chi-square-based testing procedure
🔎 Similar Papers
No similar papers found.
Cencheng Shen
Cencheng Shen
University of Delaware
Machine LearningCorrelation and Dependence
Y
Yuexiao Dong
Department of Statistics, Operations, and Data Science, Temple University.