🤖 AI Summary
This paper investigates the sample complexity of function estimation and optimization in Gaussian process regression, focusing on the intrinsic relationship between information gain and effective dimensionality—and its sensitivity to observation noise. To address this, we introduce *relative information gain*, a novel metric that quantifies the robustness of information gain under noise perturbations and reveals its smooth interpolation behavior between information gain and effective dimensionality. This metric is naturally embedded in PAC-Bayesian excess risk bounds, enabling theoretical unification. Leveraging tools from reproducing kernel Hilbert space theory, spectral analysis, and information theory, we derive upper bounds on relative information gain that explicitly depend on the kernel’s spectral decay rate. Based on these bounds, we establish minimax-optimal convergence rates, substantially improving both the precision and generality of sample complexity analysis in nonparametric Bayesian learning.
📝 Abstract
The sample complexity of estimating or maximising an unknown function in a reproducing kernel Hilbert space is known to be linked to both the effective dimension and the information gain associated with the kernel. While the information gain has an attractive information-theoretic interpretation, the effective dimension typically results in better rates. We introduce a new quantity called the relative information gain, which measures the sensitivity of the information gain with respect to the observation noise. We show that the relative information gain smoothly interpolates between the effective dimension and the information gain, and that the relative information gain has the same growth rate as the effective dimension. In the second half of the paper, we prove a new PAC-Bayesian excess risk bound for Gaussian process regression. The relative information gain arises naturally from the complexity term in this PAC-Bayesian bound. We prove bounds on the relative information gain that depend on the spectral properties of the kernel. When these upper bounds are combined with our excess risk bound, we obtain minimax-optimal rates of convergence.