A multivariate extension of Azadkia-Chatterjee's rank coefficient

📅 2025-12-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the limitation of the Azadkia–Chatterjee rank correlation coefficient—its lack of a natural extension to measure dependence between two random vectors. We propose the first symmetric, multivariate generalization that quantifies nonlinear dependence between (Y in mathbb{R}^{d_Y}) and (Z in mathbb{R}^{d_Z}), requiring only i.i.d. samples and non-degeneracy of (Y). The estimator preserves key theoretical properties: it converges almost surely to 0 if and only if (Y perp Z), and to 1 if and only if (Y) is a measurable function of (Z); it enables consistent conditional dependence estimation and monotonic bias analysis in model misspecification. Constructed via ranks and nearest neighbors, it is computed efficiently using merge sort with time complexity (O(n (log n)^{d_Y})). The estimator lies in ([0,1]), and its independence test is consistent and asymptotically normal. Numerical experiments demonstrate robustness in high dimensions and competitive statistical power.

Technology Category

Application Category

📝 Abstract
The Azadkia-Chatterjee coefficient is a rank-based measure of dependence between a random variable $Y in mathbb{R}$ and a random vector ${oldsymbol Z} in mathbb{R}^{d_Z}$. This paper proposes a multivariate extension that measures dependence between random vectors ${oldsymbol Y} in mathbb{R}^{d_Y}$ and ${oldsymbol Z} in mathbb{R}^{d_Z}$, based on $n$ i.i.d. samples. The proposed coefficient converges almost surely to a limit with the following properties: i) it lies in $[0, 1]$; ii) it equals zero if and only if ${oldsymbol Y}$ and ${oldsymbol Z}$ are independent; and iii) it equals one if and only if ${oldsymbol Y}$ is almost surely a function of ${oldsymbol Z}$. Remarkably, the only assumption required by this convergence is that ${oldsymbol Y}$ is not almost surely a constant. We further prove that under the same mild condition, the coefficient is asymptotically normal when ${oldsymbol Y}$ and ${oldsymbol Z}$ are independent and propose a merge sort based algorithm to calculate this coefficient in time complexity $O(n (log n)^{d_Y})$. Finally, we show that it can be used to measure conditional dependence between ${oldsymbol Y}$ and ${oldsymbol Z}$ conditional on a third random vector ${oldsymbol X}$, and prove that the measure is monotonic with respect to the deviation from an independence distribution under certain model restrictions.
Problem

Research questions and friction points this paper is trying to address.

Extends Azadkia-Chatterjee coefficient to measure dependence between multivariate random vectors.
Proposes a rank-based coefficient with properties of independence and functional dependence detection.
Develops an efficient algorithm for computation and applies it to conditional dependence measurement.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Extends rank-based dependence coefficient to multivariate random vectors
Proposes merge sort algorithm with O(n (log n)^d_Y) time complexity
Measures conditional dependence and monotonic deviation from independence
🔎 Similar Papers
No similar papers found.
Wenjie Huang
Wenjie Huang
Shanghai Jiao Tong University
点云压缩视频压缩图像压缩
Z
Zonghan Li
Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China
Y
Yuhao Wang
Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China; Shanghai Qi Zhi Institute, Shanghai, China