Hyb-KAN ViT: Hybrid Kolmogorov-Arnold Networks Augmented Vision Transformer

📅 2025-05-07

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

This paper addresses the weak modeling capacity, lack of multi-scale representation, and insufficient edge awareness in MLP modules of Vision Transformers (ViTs). To this end, we propose Wav-KAN-ViT—a novel vision backbone integrating orthogonal wavelet multi-resolution analysis with spline-based Kolmogorov–Arnold Networks (KANs). Its core innovation lies in a dual-module synergistic architecture: Eff-KAN, which replaces MLPs with learnable B-spline activations, and Wav-KAN, a frequency-spatial joint feature extractor embedding wavelet-derived edge priors. This work is the first to systematically incorporate wavelet priors and KAN interpretability into both ViT encoders and classification heads. Extensive experiments demonstrate state-of-the-art performance on ImageNet-1K, COCO, and ADE20K, achieving significant gains in parameter efficiency and accuracy for classification, detection, and segmentation. Ablation studies confirm that wavelet priors critically benefit segmentation, while the spline-based design substantially improves detection performance.

Technology Category

Application Category

📝 Abstract

This study addresses the inherent limitations of Multi-Layer Perceptrons (MLPs) in Vision Transformers (ViTs) by introducing Hybrid Kolmogorov-Arnold Network (KAN)-ViT (Hyb-KAN ViT), a novel framework that integrates wavelet-based spectral decomposition and spline-optimized activation functions, prior work has failed to focus on the prebuilt modularity of the ViT architecture and integration of edge detection capabilities of Wavelet functions. We propose two key modules: Efficient-KAN (Eff-KAN), which replaces MLP layers with spline functions and Wavelet-KAN (Wav-KAN), leveraging orthogonal wavelet transforms for multi-resolution feature extraction. These modules are systematically integrated in ViT encoder layers and classification heads to enhance spatial-frequency modeling while mitigating computational bottlenecks. Experiments on ImageNet-1K (Image Recognition), COCO (Object Detection and Instance Segmentation), and ADE20K (Semantic Segmentation) demonstrate state-of-the-art performance with Hyb-KAN ViT. Ablation studies validate the efficacy of wavelet-driven spectral priors in segmentation and spline-based efficiency in detection tasks. The framework establishes a new paradigm for balancing parameter efficiency and multi-scale representation in vision architectures.

Problem

Research questions and friction points this paper is trying to address.

Replacing MLPs in ViTs with spline-optimized KAN modules

Integrating wavelet transforms for multi-resolution feature extraction

Balancing parameter efficiency and multi-scale representation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid KAN-ViT integrates wavelet and spline functions

Eff-KAN replaces MLPs with spline-optimized activations

Wav-KAN uses wavelet transforms for multi-resolution features

🔎 Similar Papers

No similar papers found.