🤖 AI Summary
Kolmogorov–Arnold Networks (KANs) often lack symbolic interpretability in practice, as their learned activation functions rarely admit closed-form mathematical expressions.
Method: We propose Symbol-KAN—a differentiable, sparse, end-to-end architecture that couples symbolic basis primitives with learnable soft gating. It introduces the first differentiable symbolic sparsification mechanism grounded in Minimum Description Length (MDL), enabling adaptive switching between symbol-dominated modeling and spline-based approximation. The framework integrates a symbolic dictionary, differentiable sparsity regularization, and the Kolmogorov–Arnold representation.
Contribution/Results: Symbol-KAN achieves state-of-the-art accuracy on symbolic regression, dynamical system forecasting, and real-world tasks, while significantly reducing model size. Notably, spontaneous symbolic emergence occurs even without explicit regularization—demonstrating its intrinsic capacity to induce interpretability.
📝 Abstract
Kolmogorov-Arnold Networks (KANs) offer a promising path toward interpretable machine learning: their learnable activations can be studied individually, while collectively fitting complex data accurately. In practice, however, trained activations often lack symbolic fidelity, learning pathological decompositions with no meaningful correspondence to interpretable forms. We propose Softly Symbolified Kolmogorov-Arnold Networks (S2KAN), which integrate symbolic primitives directly into training. Each activation draws from a dictionary of symbolic and dense terms, with learnable gates that sparsify the representation. Crucially, this sparsification is differentiable, enabling end-to-end optimization, and is guided by a principled Minimum Description Length objective. When symbolic terms suffice, S2KAN discovers interpretable forms; when they do not, it gracefully degrades to dense splines. We demonstrate competitive or superior accuracy with substantially smaller models across symbolic benchmarks, dynamical systems forecasting, and real-world prediction tasks, and observe evidence of emergent self-sparsification even without regularization pressure.