🤖 AI Summary
This work addresses the computational inefficiency of traditional kernel-based independence tests, such as HSIC, which rely on costly permutation-based calibration and thus lack scalability. For the first time, the authors introduce martingale methods to this domain, proposing two studentized test statistics grounded in martingale structures whose null distributions are standard normal, thereby eliminating the need for permutations. The approach integrates a self-normalized lower-triangular Hadamard-product Gram matrix, empirical kernel centering, the martingale central limit theorem, and sample splitting techniques. Notably, the mdHSIC variant achieves finite-sample consistency for multivariate joint independence through half-sample splitting. Empirical evaluations on synthetic data demonstrate that the proposed methods maintain type-I error rates and statistical power comparable to permutation-based baselines while achieving 25- to 60-fold speedups.
📝 Abstract
The Hilbert-Schmidt Independence Criterion (HSIC) and its joint-independence extension $d\mathrm{HSIC}$ are degenerate $V$-statistics whose data-dependent weighted-$χ^2$ null limits force a permutation calibration that multiplies the per-test cost by the number of permutations, in practice two orders of magnitude. Adapting the recent martingale MMD construction for two-sample testing to the (joint) independence problem, we introduce two studentised statistics whose null distributions are standard normal regardless of the data law, so that a single normal-quantile lookup replaces the permutation step entirely. The first, $m\mathrm{HSIC}$, is a self-normalised lower-triangular sum of the Hadamard product of two empirically centred Gram matrices. Under independence and bounded-fourth-moment kernels it converges to a standard normal. It is consistent against every fixed alternative, and runs at quadratic cost in the sample size without any sample split, matching the biased HSIC $V$-statistic. Our second statistic, $md\mathrm{HSIC}$, achieves finite-sample consistency with a single half-sample split: the centring is estimated on one half and the lower-triangular self-normalised martingale is run on the other, shrinking the conditional-mean residual to a quantity that is exponentially small in $d$, so the statistic is asymptotically standard normal at every fixed number of jointly tested variables, with a per-test cost that grows only linearly in $d$. On synthetic data with per-variable input dimension from $1$ to $500$ and between $2$ and $10$ jointly tested variables, both statistics match the empirical type-I error rate and test power of permutation-calibrated baselines while running $25$ to $60\times$ faster.