🤖 AI Summary
Efficiently embedding probability measures into Hilbert spaces remains computationally prohibitive at scale due to the high cost of linearized optimal transport (LOT) and kernel mean embeddings (KME).
Method: We propose a low-support discrete approximation framework based on measure quantization, bypassing LOT and KME while preserving geometric structure.
Contribution/Results: We establish, for the first time, the statistical consistency of measure quantization approximations, theoretically guaranteeing scalability and structural fidelity in Hilbert space embedding. The method unifies optimal transport, measure quantization, and kernel methods, balancing theoretical rigor with practical deployability. Experiments demonstrate controllable embedding error, 10–100× speedup in computation, and no degradation in downstream learning performance.
📝 Abstract
This paper is focused on statistical learning from data that come as probability measures. In this setting, popular approaches consist in embedding such data into a Hilbert space with either Linearized Optimal Transport or Kernel Mean Embedding. However, the cost of computing such embeddings prohibits their direct use in large-scale settings. We study two methods based on measure quantization for approximating input probability measures with discrete measures of small-support size. The first one is based on optimal quantization of each input measure, while the second one relies on mean-measure quantization. We study the consistency of such approximations, and its implication for scalable embeddings of probability measures into a Hilbert space at a low computational cost. We finally illustrate our findings with various numerical experiments.