Population Risk Bounds for Kolmogorov-Arnold Networks Trained by DP-SGD with Correlated Noise

📅 2026-05-12

📈 Citations: 0

✨ Influential: 0

career value

221K/year

🤖 AI Summary

This work addresses the absence of generalization guarantees for Kolmogorov–Arnold Networks (KANs) trained via differentially private stochastic gradient descent (DP-SGD) with temporally correlated noise. Focusing on mini-batch SGD with gradient clipping, it presents the first analysis of DP training under correlated noise in non-convex neural networks, departing from conventional assumptions of independent noise and full-batch updates. By integrating projection-free auxiliary dynamics, a shifted iterate construction, high-probability drift arguments, and stability-based generalization analysis, the study overcomes technical challenges posed by noise correlation and projection operations. The authors establish a unified bound on the population risk that encompasses existing results for both non-private and DP-SGD settings, while offering theoretical support for improved privacy–utility trade-offs.

📝 Abstract

We establish the first population risk bounds for Kolmogorov-Arnold Networks (KANs) trained by mini-batch SGD with gradient clipping, covering non-private SGD as well as differentially private SGD (DP-SGD) with Gaussian perturbations that interpolate between independent and temporally correlated noise. This setting is substantially closer to practice than prior KAN theory along two axes: training is by mini-batch SGD, the standard recipe for modern networks, rather than full-batch gradient descent (GD); and correlated-noise mechanisms have empirically shown a more favorable privacy-utility tradeoff than independent-noise mechanisms. Our results cover the corresponding full-batch GD and independent-noise DP-GD results for KANs by Wang et al. (2026), while yielding sharper fixed-second-layer specializations. The technical core is a new analysis route for correlated-noise DP training in the non-convex regime. Temporal dependence breaks the conditional-centering structure underlying standard one-step SGD arguments, and the projection step obstructs the exact cancellation structure of correlated perturbations. We address these difficulties through an auxiliary unprojected dynamics, a shifted iterate that absorbs the current noise perturbation, and a high-probability bootstrap certifying projection inactivity. Combining this optimization analysis with a stability-based generalization argument yields the stated population risk bounds. To the best of our knowledge, this is the first optimization and population risk analysis of a correlated-noise mechanism for DP training beyond convex learning, in particular for neural networks.

Problem

Research questions and friction points this paper is trying to address.

Kolmogorov-Arnold Networks

DP-SGD

correlated noise

population risk

differential privacy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Kolmogorov-Arnold Networks

correlated noise

differentially private SGD