🤖 AI Summary
This paper studies the two-stage sampling setting in distribution regression, where input distributions are accessible only via finite samples, aiming to strengthen learning-theoretic guarantees for kernel methods. Within the Hilbert space embedding framework, we introduce the “nearly unbiased embedding” condition—a novel regularity assumption—enabling the first tight two-stage sampling error bounds for optimal transport distance, mean embeddings, and a broad class of kernel functions, with strictly improved convergence rates. Our theoretical analysis integrates functional analysis and statistical learning theory, yielding optimal-order convergence rates that surpass existing results. Numerical experiments validate both the accuracy and practical relevance of our theoretical predictions. The key innovation lies in decoupling embedding bias control from sampling error analysis, thereby establishing the first unified, general, and tight theoretical framework for distribution regression.
📝 Abstract
The distribution regression problem encompasses many important statistics and machine learning tasks, and arises in a large range of applications. Among various existing approaches to tackle this problem, kernel methods have become a method of choice. Indeed, kernel distribution regression is both computationally favorable, and supported by a recent learning theory. This theory also tackles the two-stage sampling setting, where only samples from the input distributions are available. In this paper, we improve the learning theory of kernel distribution regression. We address kernels based on Hilbertian embeddings, that encompass most, if not all, of the existing approaches. We introduce the novel near-unbiased condition on the Hilbertian embeddings, that enables us to provide new error bounds on the effect of the two-stage sampling, thanks to a new analysis. We show that this near-unbiased condition holds for three important classes of kernels, based on optimal transport and mean embedding. As a consequence, we strictly improve the existing convergence rates for these kernels. Our setting and results are illustrated by numerical experiments.