Conditional Distribution Compression via the Kernel Conditional Mean Embedding

๐Ÿ“… 2025-04-14
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses the open problem of conditional distribution compression under limited labeled data. We propose the first dedicated metric for this taskโ€”Average Maximum Conditional Mean Discrepancy (AMCMD)โ€”and develop two efficient algorithms: the linear-time Conditional Kernel Herding (ACKH) and the joint optimization framework ACKIP. We formally define conditional distribution distance, rigorously prove the consistency of the AMCMD estimator, and establish its $O(n)$ convergence rate. Crucially, we theoretically demonstrate the statistical advantage of conditional compression over joint compression methods (e.g., JKH, JKIP). Extensive multi-task experiments validate that ACKIP significantly outperforms state-of-the-art approaches. This work bridges a fundamental gap in both the theoretical foundations and algorithmic design of conditional distribution compression.

Technology Category

Application Category

๐Ÿ“ Abstract
Existing distribution compression methods, like Kernel Herding (KH), were originally developed for unlabelled data. However, no existing approach directly compresses the conditional distribution of labelled data. To address this gap, we first introduce the Average Maximum Conditional Mean Discrepancy (AMCMD), a natural metric for comparing conditional distributions. We then derive a consistent estimator for the AMCMD and establish its rate of convergence. Next, we make a key observation: in the context of distribution compression, the cost of constructing a compressed set targeting the AMCMD can be reduced from $mathcal{O}(n^3)$ to $mathcal{O}(n)$. Building on this, we extend the idea of KH to develop Average Conditional Kernel Herding (ACKH), a linear-time greedy algorithm that constructs a compressed set targeting the AMCMD. To better understand the advantages of directly compressing the conditional distribution rather than doing so via the joint distribution, we introduce Joint Kernel Herding (JKH), a straightforward adaptation of KH designed to compress the joint distribution of labelled data. While herding methods provide a simple and interpretable selection process, they rely on a greedy heuristic. To explore alternative optimisation strategies, we propose Joint Kernel Inducing Points (JKIP) and Average Conditional Kernel Inducing Points (ACKIP), which jointly optimise the compressed set while maintaining linear complexity. Experiments show that directly preserving conditional distributions with ACKIP outperforms both joint distribution compression (via JKH and JKIP) and the greedy selection used in ACKH. Moreover, we see that JKIP consistently outperforms JKH.
Problem

Research questions and friction points this paper is trying to address.

Compress conditional distribution of labelled data
Introduce AMCMD metric for conditional distributions
Develop linear-time algorithms for distribution compression
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces AMCMD for conditional distribution comparison
Develops ACKH linear-time greedy algorithm
Proposes JKIP and ACKIP for joint optimization
๐Ÿ”Ž Similar Papers
No similar papers found.
D
Dominic Broadbent
School of Mathematics, University of Bristol, Bristol, United Kingdom
Nick Whiteley
Nick Whiteley
University of Bristol
Topological and Geometric Data AnalysisNetworks and GraphsUncertainty Dynamics
R
Robert Allison
School of Mathematics, University of Bristol, Bristol, United Kingdom
T
Tom Lovett
Mathematical Institute, University of Oxford, Oxford, United Kingdom