đ¤ AI Summary
Addressing three key challenges in KolmogorovâArnold Networks (KANs)âtraining instability, parameter redundancy, and opaque mechanistic behavior of B-spline activation functionsâthis work proposes a Free-Knot KAN architecture. First, we derive a theoretical upper bound on the number of B-spline knots. Second, we design an adaptive free-knot mechanism that reduces parameter count to the same order as MLPsâapproximately one-tenth that of the original KAN. Third, we impose C² continuity constraints and introduce a range-expansion gradient training strategy to enhance activation smoothness and training robustness. Extensive evaluation across eight cross-domain benchmarksâincluding image, text, time-series, multimodal, and function approximation tasksâdemonstrates consistent improvements: our method achieves superior function approximation accuracy and downstream task performance compared to existing KAN variants, while matching or exceeding the performance of MLPs of comparable size.
đ Abstract
Kolmogorov-Arnold Neural Networks (KANs) have gained significant attention in the machine learning community. However, their implementation often suffers from poor training stability and heavy trainable parameter. Furthermore, there is limited understanding of the behavior of the learned activation functions derived from B-splines. In this work, we analyze the behavior of KANs through the lens of spline knots and derive the lower and upper bound for the number of knots in B-spline-based KANs. To address existing limitations, we propose a novel Free Knots KAN that enhances the performance of the original KAN while reducing the number of trainable parameters to match the trainable parameter scale of standard Multi-Layer Perceptrons (MLPs). Additionally, we introduce new a training strategy to ensure $C^2$ continuity of the learnable spline, resulting in smoother activation compared to the original KAN and improve the training stability by range expansion. The proposed method is comprehensively evaluated on 8 datasets spanning various domains, including image, text, time series, multimodal, and function approximation tasks. The promising results demonstrates the feasibility of KAN-based network and the effectiveness of proposed method.