🤖 AI Summary
This paper quantifies the minimax excess risk of spectral algorithms when the kernel is learned from data. To overcome limitations of classical theory—which relies on restrictive assumptions such as eigenvalue decay and source conditions—we introduce alignment-sensitive effective spanning dimension (ESD) as a novel complexity measure, capturing the joint influence of signal, spectral structure, and noise for the first time. Leveraging spectral analysis and minimax risk theory, we derive a universal upper bound on excess risk: when ESD ≤ K, the risk converges at rate σ²K. We further show that gradient flow enables adaptive feature learning, significantly reducing ESD and thereby improving generalization. Our framework extends to linear models and RKHS regression. Numerical experiments confirm the tightness of our theoretical bounds and the broad applicability of the proposed framework.
📝 Abstract
We study spectral algorithms in the setting where kernels are learned from data. We introduce the effective span dimension (ESD), an alignment-sensitive complexity measure that depends jointly on the signal, spectrum, and noise level $sigma^2$. The ESD is well-defined for arbitrary kernels and signals without requiring eigen-decay conditions or source conditions. We prove that for sequence models whose ESD is at most $K$, the minimax excess risk scales as $sigma^2 K$. Furthermore, we analyze over-parameterized gradient flow and prove that it can reduce the ESD. This finding establishes a connection between adaptive feature learning and provable improvements in generalization of spectral algorithms. We demonstrate the generality of the ESD framework by extending it to linear models and RKHS regression, and we support the theory with numerical experiments. This framework provides a novel perspective on generalization beyond traditional fixed-kernel theories.