Kolmogorov-Arnold Networks: Approximation and Learning Guarantees for Functions and their Derivatives

📅 2025-04-21

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This paper investigates the theoretical approximation capabilities of Kolmogorov–Arnold Networks (KANs), addressing the fundamental question: Can KANs achieve optimal approximation of Besov functions on arbitrary bounded or fractal domains, with dimension-independent sample complexity? To this end, the authors propose a novel KAN architecture featuring trainable B-spline activation functions and residual connections. They establish, for the first time, a rigorous theory of optimal nonlinear approximation for Besov functions by KANs. Key contributions include: (1) a proof that KANs attain optimal approximation rates in Besov spaces; (2) derivation of a dimension-independent upper bound on generalization error in the noiseless setting; and (3) theoretical guarantees for uniform high-accuracy learning of both target functions and their derivatives. These results provide a solid functional approximation-theoretic foundation for KANs and scalable learning guarantees.

Technology Category

Application Category

📝 Abstract

Inspired by the Kolmogorov-Arnold superposition theorem, Kolmogorov-Arnold Networks (KANs) have recently emerged as an improved backbone for most deep learning frameworks, promising more adaptivity than their multilayer perception (MLP) predecessor by allowing for trainable spline-based activation functions. In this paper, we probe the theoretical foundations of the KAN architecture by showing that it can optimally approximate any Besov function in $B^{s}_{p,q}(mathcal{X})$ on a bounded open, or even fractal, domain $mathcal{X}$ in $mathbb{R}^d$ at the optimal approximation rate with respect to any weaker Besov norm $B^{alpha}_{p,q}(mathcal{X})$; where $alpha<s$. We complement our approximation guarantee with a dimension-free estimate on the sample complexity of a residual KAN model when learning a function of Besov regularity from $N$ i.i.d. noiseless samples. Our KAN architecture incorporates contemporary deep learning wisdom by leveraging residual/skip connections between layers.

Problem

Research questions and friction points this paper is trying to address.

Approximating Besov functions optimally with KANs

Learning Besov functions from noiseless samples efficiently

Enhancing deep learning via trainable spline-based activation functions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Trainable spline-based activation functions

Optimal Besov function approximation

Residual connections enhance learning

🔎 Similar Papers

No similar papers found.

Authors to Follow