Initialization Schemes for Kolmogorov-Arnold Networks: An Empirical Study

📅 2025-09-03

📈 Citations: 0

✨ Influential: 0

career value

193K/year

🤖 AI Summary

This work addresses the lack of a systematic initialization theory for Kolmogorov–Arnold Networks (KANs). Methodologically, it introduces the first initialization framework tailored to spline-based activation functions: (1) leveraging Neural Tangent Kernel analysis, it derives LeCun- and Glorot-style theoretical initialization criteria; (2) it proposes a tunable power-law family of empirical initializations that jointly optimize representational capacity and training stability. Experiments on function approximation, forward PDE solving, and Feynman symbolic regression demonstrate that Glorot-style initialization significantly accelerates convergence in large-parameter models, while the power-law initialization consistently achieves superior generalization across multi-task and multi-scale architectures. This work establishes the first initialization paradigm for KANs that is both theoretically grounded and empirically robust.

Technology Category

Application Category

📝 Abstract

Kolmogorov-Arnold Networks (KANs) are a recently introduced neural architecture that replace fixed nonlinearities with trainable activation functions, offering enhanced flexibility and interpretability. While KANs have been applied successfully across scientific and machine learning tasks, their initialization strategies remain largely unexplored. In this work, we study initialization schemes for spline-based KANs, proposing two theory-driven approaches inspired by LeCun and Glorot, as well as an empirical power-law family with tunable exponents. Our evaluation combines large-scale grid searches on function fitting and forward PDE benchmarks, an analysis of training dynamics through the lens of the Neural Tangent Kernel, and evaluations on a subset of the Feynman dataset. Our findings indicate that the Glorot-inspired initialization significantly outperforms the baseline in parameter-rich models, while power-law initialization achieves the strongest performance overall, both across tasks and for architectures of varying size. All code and data accompanying this manuscript are publicly available at https://github.com/srigas/KAN_Initialization_Schemes.

Problem

Research questions and friction points this paper is trying to address.

Investigating initialization strategies for spline-based Kolmogorov-Arnold Networks

Evaluating theory-driven and empirical initialization methods for KANs

Assessing initialization performance across function fitting and PDE benchmarks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Proposing Glorot-inspired initialization for KANs

Developing empirical power-law family initialization

Using spline-based trainable activation functions

🔎 Similar Papers

On the Robustness of Kolmogorov-Arnold Networks: An Adversarial Perspective