🤖 AI Summary
This work addresses the lack of generalization theory for kernel ridge regression (KRR) under non-i.i.d. data, specifically focusing on dependent data exhibiting signal–noise causal structure—e.g., multiple noisy observations sharing a common latent signal, as arises in denoising score learning. We propose a blockwise decomposition framework that, for the first time, systematically characterizes the joint influence of kernel spectrum, causal strength, and sampling mechanism on generalization error. By integrating causal modeling with spectral analysis, we derive an explicit upper bound on the excess risk, revealing how structural dependencies either exacerbate or mitigate overfitting. Our results yield interpretable, theoretically grounded sampling strategies for denoising score learning and provide rigorous generalization guarantees—thereby filling a critical gap in KRR theory for structured dependent data.
📝 Abstract
Kernel ridge regression (KRR) is a foundational tool in machine learning, with recent work emphasizing its connections to neural networks. However, existing theory primarily addresses the i.i.d. setting, while real-world data often exhibits structured dependencies - particularly in applications like denoising score learning where multiple noisy observations derive from shared underlying signals. We present the first systematic study of KRR generalization for non-i.i.d. data with signal-noise causal structure, where observations represent different noisy views of common signals. By developing a novel blockwise decomposition method that enables precise concentration analysis for dependent data, we derive excess risk bounds for KRR that explicitly depend on: (1) the kernel spectrum, (2) causal structure parameters, and (3) sampling mechanisms (including relative sample sizes for signals and noises). We further apply our results to denoising score learning, establishing generalization guarantees and providing principled guidance for sampling noisy data points. This work advances KRR theory while providing practical tools for analyzing dependent data in modern machine learning applications.