🤖 AI Summary
This paper investigates the theoretical approximation capabilities of Kolmogorov–Arnold Networks (KANs), addressing the fundamental question: Can KANs achieve optimal approximation of Besov functions on arbitrary bounded or fractal domains, with dimension-independent sample complexity? To this end, the authors propose a novel KAN architecture featuring trainable B-spline activation functions and residual connections. They establish, for the first time, a rigorous theory of optimal nonlinear approximation for Besov functions by KANs. Key contributions include: (1) a proof that KANs attain optimal approximation rates in Besov spaces; (2) derivation of a dimension-independent upper bound on generalization error in the noiseless setting; and (3) theoretical guarantees for uniform high-accuracy learning of both target functions and their derivatives. These results provide a solid functional approximation-theoretic foundation for KANs and scalable learning guarantees.
📝 Abstract
Inspired by the Kolmogorov-Arnold superposition theorem, Kolmogorov-Arnold Networks (KANs) have recently emerged as an improved backbone for most deep learning frameworks, promising more adaptivity than their multilayer perception (MLP) predecessor by allowing for trainable spline-based activation functions. In this paper, we probe the theoretical foundations of the KAN architecture by showing that it can optimally approximate any Besov function in $B^{s}_{p,q}(mathcal{X})$ on a bounded open, or even fractal, domain $mathcal{X}$ in $mathbb{R}^d$ at the optimal approximation rate with respect to any weaker Besov norm $B^{alpha}_{p,q}(mathcal{X})$; where $alpha<s$. We complement our approximation guarantee with a dimension-free estimate on the sample complexity of a residual KAN model when learning a function of Besov regularity from $N$ i.i.d. noiseless samples. Our KAN architecture incorporates contemporary deep learning wisdom by leveraging residual/skip connections between layers.