Vision KAN: Towards an Attention-Free Backbone for Vision with Kolmogorov-Arnold Networks

📅 2026-01-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the high computational complexity (O(N²)) and poor interpretability of attention mechanisms in vision backbones by proposing Vision KAN (ViK)—the first attention-free visual backbone based on Kolmogorov–Arnold Networks (KANs). ViK introduces a unified token mixer, the MultiPatch-RBFKAN module, which integrates radial basis function KANs, patch-wise nonlinear transformations, axial local propagation, and low-rank global interactions. This design achieves linear computational complexity while circumventing the full KAN computation bottleneck at high resolutions. Experimental results demonstrate that ViK attains competitive accuracy on ImageNet-1K compared to state-of-the-art methods, offering both computational efficiency and enhanced interpretability.

Technology Category

Application Category

📝 Abstract
Attention mechanisms have become a key module in modern vision backbones due to their ability to model long-range dependencies. However, their quadratic complexity in sequence length and the difficulty of interpreting attention weights limit both scalability and clarity. Recent attention-free architectures demonstrate that strong performance can be achieved without pairwise attention, motivating the search for alternatives. In this work, we introduce Vision KAN (ViK), an attention-free backbone inspired by the Kolmogorov-Arnold Networks. At its core lies MultiPatch-RBFKAN, a unified token mixer that combines (a) patch-wise nonlinear transform with Radial Basis Function-based KANs, (b) axis-wise separable mixing for efficient local propagation, and (c) low-rank global mapping for long-range interaction. Employing as a drop-in replacement for attention modules, this formulation tackles the prohibitive cost of full KANs on high-resolution features by adopting a patch-wise grouping strategy with lightweight operators to restore cross-patch dependencies. Experiments on ImageNet-1K show that ViK achieves competitive accuracy with linear complexity, demonstrating the potential of KAN-based token mixing as an efficient and theoretically grounded alternative to attention.
Problem

Research questions and friction points this paper is trying to address.

attention mechanism
quadratic complexity
interpretability
vision backbone
scalability
Innovation

Methods, ideas, or system contributions that make the work stand out.

Kolmogorov-Arnold Networks
attention-free
token mixer
Radial Basis Function
linear complexity
🔎 Similar Papers
No similar papers found.
Z
Zhuoqin Yang
College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China
J
Jiansong Zhang
College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China
Xiaoling Luo
Xiaoling Luo
Shenzhen University; Harbin Institute of Technology, Shenzhen
Medical image processingComputer vision
Xu Wu
Xu Wu
Associate Professor of Nuclear Engineering, North Carolina State University
Uncertainty QuantificationScientific Machine LearningInverse ProblemsNuclear Engineering
Zheng Lu
Zheng Lu
University of Nottingham Ningbo China
Computer VisionNatural Language ProcessingMachine Learning
L
LinLin Shen
School of Artificial Intelligence, Shenzhen University, Shenzhen, China