Exploring Kolmogorov-Arnold Network Expansions in Vision Transformers for Mitigating Catastrophic Forgetting in Continual Learning

📅 2025-07-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Vision Transformers (ViTs) suffer from catastrophic forgetting in continual learning due to rigid, globally updated multilayer perceptrons (MLPs). Method: This work introduces the Kolmogorov–Arnold Network (KAN)—a function-approximator with local plasticity—into the ViT architecture, replacing standard MLP blocks. KAN employs learnable spline-based activation functions that enable sparse, task-adaptive parameter updates, thereby isolating parameter interference between sequential tasks and enhancing knowledge retention. Contribution/Results: Evaluated on benchmark continual learning benchmarks—including MNIST and CIFAR-100—KAN-ViT achieves substantial improvements over baseline ViTs: average accuracy gains of 3.2–5.7 percentage points and forgetting rate reductions of 42%–61%. This study establishes a novel paradigm for continual learning in ViTs and empirically validates structured plasticity grounded in functional approximation as an effective mechanism for mitigating catastrophic forgetting.

Technology Category

Application Category

📝 Abstract
Continual learning (CL), the ability of a model to learn new tasks without forgetting previously acquired knowledge, remains a critical challenge in artificial intelligence, particularly for vision transformers (ViTs) utilizing Multilayer Perceptrons (MLPs) for global representation learning. Catastrophic forgetting, where new information overwrites prior knowledge, is especially problematic in these models. This research proposes replacing MLPs in ViTs with Kolmogorov-Arnold Network (KANs) to address this issue. KANs leverage local plasticity through spline-based activations, ensuring that only a subset of parameters is updated per sample, thereby preserving previously learned knowledge. The study investigates the efficacy of KAN-based ViTs in CL scenarios across benchmark datasets (MNIST, CIFAR100), focusing on their ability to retain accuracy on earlier tasks while adapting to new ones. Experimental results demonstrate that KAN-based ViTs significantly mitigate catastrophic forgetting, outperforming traditional MLP-based ViTs in knowledge retention and task adaptation. This novel integration of KANs into ViTs represents a promising step toward more robust and adaptable models for dynamic environments.
Problem

Research questions and friction points this paper is trying to address.

Mitigating catastrophic forgetting in continual learning
Replacing MLPs with KANs in vision transformers
Improving knowledge retention in dynamic environments
Innovation

Methods, ideas, or system contributions that make the work stand out.

Replace MLPs with KANs in ViTs
Use spline-based activations for local plasticity
Update subset of parameters per sample
🔎 Similar Papers