Kolmogorov-Arnold Networks: Approximation and Learning Guarantees for Functions and their Derivatives

📅 2025-04-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper investigates the theoretical approximation capabilities of Kolmogorov–Arnold Networks (KANs), addressing the fundamental question: Can KANs achieve optimal approximation of Besov functions on arbitrary bounded or fractal domains, with dimension-independent sample complexity? To this end, the authors propose a novel KAN architecture featuring trainable B-spline activation functions and residual connections. They establish, for the first time, a rigorous theory of optimal nonlinear approximation for Besov functions by KANs. Key contributions include: (1) a proof that KANs attain optimal approximation rates in Besov spaces; (2) derivation of a dimension-independent upper bound on generalization error in the noiseless setting; and (3) theoretical guarantees for uniform high-accuracy learning of both target functions and their derivatives. These results provide a solid functional approximation-theoretic foundation for KANs and scalable learning guarantees.

Technology Category

Application Category

📝 Abstract
Inspired by the Kolmogorov-Arnold superposition theorem, Kolmogorov-Arnold Networks (KANs) have recently emerged as an improved backbone for most deep learning frameworks, promising more adaptivity than their multilayer perception (MLP) predecessor by allowing for trainable spline-based activation functions. In this paper, we probe the theoretical foundations of the KAN architecture by showing that it can optimally approximate any Besov function in $B^{s}_{p,q}(mathcal{X})$ on a bounded open, or even fractal, domain $mathcal{X}$ in $mathbb{R}^d$ at the optimal approximation rate with respect to any weaker Besov norm $B^{alpha}_{p,q}(mathcal{X})$; where $alpha<s$. We complement our approximation guarantee with a dimension-free estimate on the sample complexity of a residual KAN model when learning a function of Besov regularity from $N$ i.i.d. noiseless samples. Our KAN architecture incorporates contemporary deep learning wisdom by leveraging residual/skip connections between layers.
Problem

Research questions and friction points this paper is trying to address.

Approximating Besov functions optimally with KANs
Learning Besov functions from noiseless samples efficiently
Enhancing deep learning via trainable spline-based activation functions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Trainable spline-based activation functions
Optimal Besov function approximation
Residual connections enhance learning
🔎 Similar Papers
No similar papers found.