Sequential Kernelized Independence Testing

📅 2022-12-14
🏛️ International Conference on Machine Learning
📈 Citations: 21
Influential: 4
📄 PDF
🤖 AI Summary
This work addresses independence testing for streaming data by proposing the first sequential, kernel-based independence test that supports real-time updates and adaptive stopping. Unlike conventional batch tests—whose validity is compromised by data peeking and non-stationary distributions—we introduce the “testing by betting” paradigm to kernel independence testing for the first time. Specifically, we construct sequential betting strategies based on kernel dependence measures such as the Hilbert–Schmidt Independence Criterion (HSIC), leveraging functional-space statistics and optional sampling theory to guarantee strict control of Type-I error at any stopping time—even under non-i.i.d. and time-evolving data. Experiments demonstrate that our method reduces average sample complexity by 30%–50% while maintaining nominal error rates, thereby significantly improving detection power and sampling efficiency.
📝 Abstract
Independence testing is a classical statistical problem that has been extensively studied in the batch setting when one fixes the sample size before collecting data. However, practitioners often prefer procedures that adapt to the complexity of a problem at hand instead of setting sample size in advance. Ideally, such procedures should (a) stop earlier on easy tasks (and later on harder tasks), hence making better use of available resources, and (b) continuously monitor the data and efficiently incorporate statistical evidence after collecting new data, while controlling the false alarm rate. Classical batch tests are not tailored for streaming data: valid inference after data peeking requires correcting for multiple testing which results in low power. Following the principle of testing by betting, we design sequential kernelized independence tests that overcome such shortcomings. We exemplify our broad framework using bets inspired by kernelized dependence measures, e.g., the Hilbert-Schmidt independence criterion. Our test is also valid under non-i.i.d., time-varying settings. We demonstrate the power of our approaches on both simulated and real data.
Problem

Research questions and friction points this paper is trying to address.

Adaptive independence testing for streaming data
Control false alarm rate while incorporating new evidence
Valid under non-iid timevarying settings
Innovation

Methods, ideas, or system contributions that make the work stand out.

Sequential kernelized independence testing for streaming data
Adaptive sample size based on problem complexity
Valid under non-i.i.d., time-varying conditions
🔎 Similar Papers
No similar papers found.
A
A. Podkopaev
Carnegie Mellon University, Amazon Web Services
Patrick Blöbaum
Patrick Blöbaum
Amazon (AWS)
S
S. Kasiviswanathan
Amazon Web Services
Aaditya Ramdas
Aaditya Ramdas
Associate Professor (with tenure), Carnegie Mellon University
Machine LearningStatistics