Vision KAN: Towards an Attention-Free Backbone for Vision with Kolmogorov-Arnold Networks

📅 2026-01-29

📈 Citations: 0

✨ Influential: 0

career value

243K/year

🤖 AI Summary

This work addresses the high computational complexity (O(N²)) and poor interpretability of attention mechanisms in vision backbones by proposing Vision KAN (ViK)—the first attention-free visual backbone based on Kolmogorov–Arnold Networks (KANs). ViK introduces a unified token mixer, the MultiPatch-RBFKAN module, which integrates radial basis function KANs, patch-wise nonlinear transformations, axial local propagation, and low-rank global interactions. This design achieves linear computational complexity while circumventing the full KAN computation bottleneck at high resolutions. Experimental results demonstrate that ViK attains competitive accuracy on ImageNet-1K compared to state-of-the-art methods, offering both computational efficiency and enhanced interpretability.

Technology Category

Application Category

📝 Abstract

Attention mechanisms have become a key module in modern vision backbones due to their ability to model long-range dependencies. However, their quadratic complexity in sequence length and the difficulty of interpreting attention weights limit both scalability and clarity. Recent attention-free architectures demonstrate that strong performance can be achieved without pairwise attention, motivating the search for alternatives. In this work, we introduce Vision KAN (ViK), an attention-free backbone inspired by the Kolmogorov-Arnold Networks. At its core lies MultiPatch-RBFKAN, a unified token mixer that combines (a) patch-wise nonlinear transform with Radial Basis Function-based KANs, (b) axis-wise separable mixing for efficient local propagation, and (c) low-rank global mapping for long-range interaction. Employing as a drop-in replacement for attention modules, this formulation tackles the prohibitive cost of full KANs on high-resolution features by adopting a patch-wise grouping strategy with lightweight operators to restore cross-patch dependencies. Experiments on ImageNet-1K show that ViK achieves competitive accuracy with linear complexity, demonstrating the potential of KAN-based token mixing as an efficient and theoretically grounded alternative to attention.

Problem

Research questions and friction points this paper is trying to address.

attention mechanism

quadratic complexity

interpretability

vision backbone

scalability

Innovation

Methods, ideas, or system contributions that make the work stand out.

Kolmogorov-Arnold Networks

attention-free

token mixer