Hyb-KAN ViT: Hybrid Kolmogorov-Arnold Networks Augmented Vision Transformer

📅 2025-05-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the weak modeling capacity, lack of multi-scale representation, and insufficient edge awareness in MLP modules of Vision Transformers (ViTs). To this end, we propose Wav-KAN-ViT—a novel vision backbone integrating orthogonal wavelet multi-resolution analysis with spline-based Kolmogorov–Arnold Networks (KANs). Its core innovation lies in a dual-module synergistic architecture: Eff-KAN, which replaces MLPs with learnable B-spline activations, and Wav-KAN, a frequency-spatial joint feature extractor embedding wavelet-derived edge priors. This work is the first to systematically incorporate wavelet priors and KAN interpretability into both ViT encoders and classification heads. Extensive experiments demonstrate state-of-the-art performance on ImageNet-1K, COCO, and ADE20K, achieving significant gains in parameter efficiency and accuracy for classification, detection, and segmentation. Ablation studies confirm that wavelet priors critically benefit segmentation, while the spline-based design substantially improves detection performance.

Technology Category

Application Category

📝 Abstract
This study addresses the inherent limitations of Multi-Layer Perceptrons (MLPs) in Vision Transformers (ViTs) by introducing Hybrid Kolmogorov-Arnold Network (KAN)-ViT (Hyb-KAN ViT), a novel framework that integrates wavelet-based spectral decomposition and spline-optimized activation functions, prior work has failed to focus on the prebuilt modularity of the ViT architecture and integration of edge detection capabilities of Wavelet functions. We propose two key modules: Efficient-KAN (Eff-KAN), which replaces MLP layers with spline functions and Wavelet-KAN (Wav-KAN), leveraging orthogonal wavelet transforms for multi-resolution feature extraction. These modules are systematically integrated in ViT encoder layers and classification heads to enhance spatial-frequency modeling while mitigating computational bottlenecks. Experiments on ImageNet-1K (Image Recognition), COCO (Object Detection and Instance Segmentation), and ADE20K (Semantic Segmentation) demonstrate state-of-the-art performance with Hyb-KAN ViT. Ablation studies validate the efficacy of wavelet-driven spectral priors in segmentation and spline-based efficiency in detection tasks. The framework establishes a new paradigm for balancing parameter efficiency and multi-scale representation in vision architectures.
Problem

Research questions and friction points this paper is trying to address.

Replacing MLPs in ViTs with spline-optimized KAN modules
Integrating wavelet transforms for multi-resolution feature extraction
Balancing parameter efficiency and multi-scale representation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid KAN-ViT integrates wavelet and spline functions
Eff-KAN replaces MLPs with spline-optimized activations
Wav-KAN uses wavelet transforms for multi-resolution features
🔎 Similar Papers
No similar papers found.