🤖 AI Summary
This paper addresses the limitation of the Azadkia–Chatterjee rank correlation coefficient—its lack of a natural extension to measure dependence between two random vectors. We propose the first symmetric, multivariate generalization that quantifies nonlinear dependence between (Y in mathbb{R}^{d_Y}) and (Z in mathbb{R}^{d_Z}), requiring only i.i.d. samples and non-degeneracy of (Y). The estimator preserves key theoretical properties: it converges almost surely to 0 if and only if (Y perp Z), and to 1 if and only if (Y) is a measurable function of (Z); it enables consistent conditional dependence estimation and monotonic bias analysis in model misspecification. Constructed via ranks and nearest neighbors, it is computed efficiently using merge sort with time complexity (O(n (log n)^{d_Y})). The estimator lies in ([0,1]), and its independence test is consistent and asymptotically normal. Numerical experiments demonstrate robustness in high dimensions and competitive statistical power.
📝 Abstract
The Azadkia-Chatterjee coefficient is a rank-based measure of dependence between a random variable $Y in mathbb{R}$ and a random vector ${oldsymbol Z} in mathbb{R}^{d_Z}$. This paper proposes a multivariate extension that measures dependence between random vectors ${oldsymbol Y} in mathbb{R}^{d_Y}$ and ${oldsymbol Z} in mathbb{R}^{d_Z}$, based on $n$ i.i.d. samples. The proposed coefficient converges almost surely to a limit with the following properties: i) it lies in $[0, 1]$; ii) it equals zero if and only if ${oldsymbol Y}$ and ${oldsymbol Z}$ are independent; and iii) it equals one if and only if ${oldsymbol Y}$ is almost surely a function of ${oldsymbol Z}$. Remarkably, the only assumption required by this convergence is that ${oldsymbol Y}$ is not almost surely a constant. We further prove that under the same mild condition, the coefficient is asymptotically normal when ${oldsymbol Y}$ and ${oldsymbol Z}$ are independent and propose a merge sort based algorithm to calculate this coefficient in time complexity $O(n (log n)^{d_Y})$. Finally, we show that it can be used to measure conditional dependence between ${oldsymbol Y}$ and ${oldsymbol Z}$ conditional on a third random vector ${oldsymbol X}$, and prove that the measure is monotonic with respect to the deviation from an independence distribution under certain model restrictions.