Improved learning theory for kernel distribution regression with two-stage sampling

📅 2023-08-28

📈 Citations: 2

✨ Influential: 0

career value

224K/year

🤖 AI Summary

This paper studies the two-stage sampling setting in distribution regression, where input distributions are accessible only via finite samples, aiming to strengthen learning-theoretic guarantees for kernel methods. Within the Hilbert space embedding framework, we introduce the “nearly unbiased embedding” condition—a novel regularity assumption—enabling the first tight two-stage sampling error bounds for optimal transport distance, mean embeddings, and a broad class of kernel functions, with strictly improved convergence rates. Our theoretical analysis integrates functional analysis and statistical learning theory, yielding optimal-order convergence rates that surpass existing results. Numerical experiments validate both the accuracy and practical relevance of our theoretical predictions. The key innovation lies in decoupling embedding bias control from sampling error analysis, thereby establishing the first unified, general, and tight theoretical framework for distribution regression.

📝 Abstract

The distribution regression problem encompasses many important statistics and machine learning tasks, and arises in a large range of applications. Among various existing approaches to tackle this problem, kernel methods have become a method of choice. Indeed, kernel distribution regression is both computationally favorable, and supported by a recent learning theory. This theory also tackles the two-stage sampling setting, where only samples from the input distributions are available. In this paper, we improve the learning theory of kernel distribution regression. We address kernels based on Hilbertian embeddings, that encompass most, if not all, of the existing approaches. We introduce the novel near-unbiased condition on the Hilbertian embeddings, that enables us to provide new error bounds on the effect of the two-stage sampling, thanks to a new analysis. We show that this near-unbiased condition holds for three important classes of kernels, based on optimal transport and mean embedding. As a consequence, we strictly improve the existing convergence rates for these kernels. Our setting and results are illustrated by numerical experiments.

Problem

Research questions and friction points this paper is trying to address.

Improving learning theory for kernel distribution regression

Addressing two-stage sampling with novel near-unbiased condition

Providing better error bounds and convergence rates

Innovation

Methods, ideas, or system contributions that make the work stand out.

Novel near-unbiased condition for Hilbertian embeddings

Improved error bounds for two-stage sampling

Strictly enhanced convergence rates for kernels

🔎 Similar Papers

Improving the Weighting Strategy in KernelSHAP