Measuring Differences between Conditional Distributions using Kernel Embeddings

📅 2026-05-04

📈 Citations: 0

✨ Influential: 0

career value

229K/year

🤖 AI Summary

This work addresses the fundamental challenge of measuring discrepancies between conditional distributions in statistics and machine learning. The authors propose Conditional Maximum Mean Discrepancy (CMMD), a unified framework grounded in reproducing kernel Hilbert space embeddings, which establishes a hierarchical family ranging from CMMD₀ to CMMDₛ and elucidates their intrinsic mathematical relationships. A key innovation is the introduction of a doubly robust estimator that guarantees consistent estimation as long as either the conditional mean model or the weighting model is correctly specified. Combining operator smoothing with nonparametric techniques, the theoretical analysis is complemented by empirical results demonstrating that CMMD effectively captures complex conditional dependence structures and significantly outperforms existing methods in conditional distribution testing tasks.

📝 Abstract

Comparing conditional distributions is a fundamental challenge in statistics and machine learning, with applications across a wide range of domains. While proposed methods for measuring discrepancies using kernel embeddings of distributions in a reproducing kernel Hilbert space (RKHS) provide powerful non-parametric techniques, the existing literature remains fragmented and lacks a unified theoretical treatment. This paper addresses this gap by establishing a coherent framework for studying kernel-based methods to measure divergence between conditional distributions through what we refer to as conditional maximum mean discrepancy (CMMD). The CMMD consists of a family of metrics which we call levels, with three special cases each using a different type of RKHS embedding: CMMD$_0$ (conditional mean operators), CMMD$_1$ (conditional mean embeddings), and CMMD$_2$ (joint mean embeddings). We additionally introduce a general level $s$ CMMD, clarifying the required assumptions, and establishing mathematical connections between the levels through the lens of operator-based smoothing. In addition to reviewing previously proposed estimators, we introduce a novel doubly robust estimator for the CMMD that maintains consistency provided at least one of the underlying models is correctly specified. We provide numerical experiments demonstrating that the CMMD effectively captures complex conditional dependencies for statistical testing.

Problem

Research questions and friction points this paper is trying to address.

conditional distributions

kernel embeddings

maximum mean discrepancy

RKHS

distribution comparison

Innovation

Methods, ideas, or system contributions that make the work stand out.

conditional maximum mean discrepancy

kernel embeddings

reproducing kernel Hilbert space