A kernel conditional two-sample test

📅 2025-06-04

📈 Citations: 0

✨ Influential: 0

career value

193K/year

🤖 AI Summary

This paper addresses conditional two-sample testing for non-i.i.d., sequential, or operationally parameter-dependent data, aiming to localize regions in the covariate space where the conditional expectation distributions of two samples significantly differ. Methodologically, it introduces the first framework integrating kernel ridge regression with conditional kernel mean embeddings, extends online confidence bounds to infinite-dimensional outputs and non-trace-class kernels, and proposes a parameter-free parametric bootstrap thresholding procedure. Theoretical analysis leverages concentration inequalities for vector-valued least-squares estimators and online learning tools, relaxing the i.i.d. assumption. Experiments demonstrate high detection accuracy and practical efficacy in process monitoring and dynamical system comparison tasks.

Technology Category

Application Category

📝 Abstract

We propose a framework for hypothesis testing on conditional probability distributions, which we then use to construct conditional two-sample statistical tests. These tests identify the inputs -- called covariates in this context -- where two conditional expectations differ with high probability. Our key idea is to transform confidence bounds of a learning method into a conditional two-sample test, and we instantiate this principle for kernel ridge regression (KRR) and conditional kernel mean embeddings. We generalize existing pointwise-in-time or time-uniform confidence bounds for KRR to previously-inaccessible yet essential cases such as infinite-dimensional outputs with non-trace-class kernels. These bounds enable circumventing the need for independent data in our statistical tests, since they allow online sampling. We also introduce bootstrapping schemes leveraging the parametric form of testing thresholds identified in theory to avoid tuning inaccessible parameters, making our method readily applicable in practice. Such conditional two-sample tests are especially relevant in applications where data arrive sequentially or non-independently, or when output distributions vary with operational parameters. We demonstrate their utility through examples in process monitoring and comparison of dynamical systems. Overall, our results establish a comprehensive foundation for conditional two-sample testing, from theoretical guarantees to practical implementation, and advance the state-of-the-art on the concentration of vector-valued least squares estimation.

Problem

Research questions and friction points this paper is trying to address.

Testing differences in conditional probability distributions between two samples

Developing kernel-based tests for non-independent or sequential data scenarios

Generalizing confidence bounds for kernel ridge regression with infinite-dimensional outputs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Transforms learning confidence bounds into conditional tests

Generalizes KRR bounds for infinite-dimensional outputs

Introduces bootstrapping to avoid parameter tuning

🔎 Similar Papers

Practical Kernel Tests of Conditional Independence