Sprecher Networks: A Parameter-Efficient Kolmogorov-Arnold Architecture

📅 2025-12-22

📈 Citations: 0

✨ Influential: 0

career value

185K/year

🤖 AI Summary

To address parameter redundancy and high memory overhead in multivariate continuous function approximation with MLPs and KANs, this paper proposes SprecherNet—a learnable neural network grounded in Sprechter’s function decomposition. Methodologically, it introduces the first end-to-end trainable architecture implementing Sprecher’s (1965) shift-spline formula, leveraging shared learnable splines and structured shift blocks to achieve exact modeling within a single layer; depth-wise composition and lateral hybrid connections further enhance cross-dimensional interaction. Theoretically, SprecherNet achieves parameter complexity of *O(LN + LG)* and peak memory of *O(N)*, substantially improving over prior methods. Empirically, it consistently outperforms MLP and KAN baselines in function approximation and downstream tasks—using fewer parameters and GPU memory—enabling wider network deployment.

Technology Category

Application Category

📝 Abstract

We present Sprecher Networks (SNs), a family of trainable neural architectures inspired by the classical Kolmogorov-Arnold-Sprecher (KAS) construction for approximating multivariate continuous functions. Distinct from Multi-Layer Perceptrons (MLPs) with fixed node activations and Kolmogorov-Arnold Networks (KANs) featuring learnable edge activations, SNs utilize shared, learnable splines (monotonic and general) within structured blocks incorporating explicit shift parameters and mixing weights. Our approach directly realizes Sprecher's specific 1965 sum of shifted splines formula in its single-layer variant and extends it to deeper, multi-layer compositions. We further enhance the architecture with optional lateral mixing connections that enable intra-block communication between output dimensions, providing a parameter-efficient alternative to full attention mechanisms. Beyond parameter efficiency with $O(LN + LG)$ scaling (where $G$ is the knot count of the shared splines) versus MLPs' $O(LN^2)$, SNs admit a sequential evaluation strategy that reduces peak forward-intermediate memory from $O(N^2)$ to $O(N)$ (treating batch size as constant), making much wider architectures feasible under memory constraints. We demonstrate empirically that composing these blocks into deep networks leads to highly parameter and memory-efficient models, discuss theoretical motivations, and compare SNs with related architectures (MLPs, KANs, and networks with learnable node activations).

Problem

Research questions and friction points this paper is trying to address.

Develop parameter-efficient neural networks using shared learnable splines

Reduce memory usage via sequential evaluation for wider architectures

Provide alternative to attention mechanisms with lateral mixing connections

Innovation

Methods, ideas, or system contributions that make the work stand out.

Shared learnable splines with shift parameters

Lateral mixing connections for intra-block communication

Sequential evaluation reducing memory to O(N)

🔎 Similar Papers

Effective Integration of KAN for Keyword Spotting