🤖 AI Summary
This work addresses the high computational complexity (O(N²)) and poor interpretability of attention mechanisms in vision backbones by proposing Vision KAN (ViK)—the first attention-free visual backbone based on Kolmogorov–Arnold Networks (KANs). ViK introduces a unified token mixer, the MultiPatch-RBFKAN module, which integrates radial basis function KANs, patch-wise nonlinear transformations, axial local propagation, and low-rank global interactions. This design achieves linear computational complexity while circumventing the full KAN computation bottleneck at high resolutions. Experimental results demonstrate that ViK attains competitive accuracy on ImageNet-1K compared to state-of-the-art methods, offering both computational efficiency and enhanced interpretability.
📝 Abstract
Attention mechanisms have become a key module in modern vision backbones due to their ability to model long-range dependencies. However, their quadratic complexity in sequence length and the difficulty of interpreting attention weights limit both scalability and clarity. Recent attention-free architectures demonstrate that strong performance can be achieved without pairwise attention, motivating the search for alternatives. In this work, we introduce Vision KAN (ViK), an attention-free backbone inspired by the Kolmogorov-Arnold Networks. At its core lies MultiPatch-RBFKAN, a unified token mixer that combines (a) patch-wise nonlinear transform with Radial Basis Function-based KANs, (b) axis-wise separable mixing for efficient local propagation, and (c) low-rank global mapping for long-range interaction. Employing as a drop-in replacement for attention modules, this formulation tackles the prohibitive cost of full KANs on high-resolution features by adopting a patch-wise grouping strategy with lightweight operators to restore cross-patch dependencies. Experiments on ImageNet-1K show that ViK achieves competitive accuracy with linear complexity, demonstrating the potential of KAN-based token mixing as an efficient and theoretically grounded alternative to attention.