🤖 AI Summary
This work addresses the lack of theoretical convergence advantages—specifically, rates faster than $O(n^{-1/2})$—for kernel herding in infinite-dimensional reproducing kernel Hilbert spaces (RKHS). We propose a deterministic sampling framework grounded in Gibbs measures, wherein node configurations are selected by minimizing the worst-case integration error under a suitably constructed joint distribution. This marks the first systematic incorporation of Gibbs measure theory into deterministic numerical integration analysis. Theoretically, our method yields tighter concentration inequalities for integration error in RKHS compared to i.i.d. Monte Carlo, establishing a strictly superior worst-case error bound. Empirically, preliminary experiments demonstrate super-root-$n$ convergence rates beyond the worst-case setting. The core innovation lies in the synergistic integration of Gibbs measures with worst-case error analysis, providing the first theoretically grounded acceleration mechanism for kernel herding.
📝 Abstract
Kernel herding belongs to a family of deterministic quadratures that seek to minimize the worst-case integration error over a reproducing kernel Hilbert space (RKHS). In spite of strong experimental support, it has revealed difficult to prove that this worst-case error decreases at a faster rate than the standard square root of the number of quadrature nodes, at least in the usual case where the RKHS is infinite-dimensional. In this theoretical paper, we study a joint probability distribution over quadrature nodes, whose support tends to minimize the same worst-case error as kernel herding. We prove that it does outperform i.i.d. Monte Carlo, in the sense of coming with a tighter concentration inequality on the worst-case integration error. While not improving the rate yet, this demonstrates that the mathematical tools of the study of Gibbs measures can help understand to what extent kernel herding and its variants improve on computationally cheaper methods. Moreover, we provide early experimental evidence that a faster rate of convergence, though not worst-case, is likely.