Hardware Acceleration of Kolmogorov-Arnold Network (KAN) in Large-Scale Systems

📅 2025-09-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the substantial hardware overhead and poor scalability of Kolmogorov–Arnold Networks (KANs) caused by computationally intensive B-spline evaluations, this work proposes an algorithm–hardware co-optimization architecture. First, we introduce a novel joint quantization scheme—Alignment-Symmetry and PowerGap—integrated with sparsity-aware mapping. Second, we design an N:1 time-domain modulated dynamic voltage generator to relax input precision requirements. Third, we implement an analog compute-in-memory (ACIM) circuit based on resistive random-access memory (RRAM) in 22 nm CMOS technology. Experimental results demonstrate that when scaling model parameters from 500K× to 807K×, the proposed architecture incurs only a 53% area increase, an 85% power rise, and a marginal accuracy degradation of 0.11%–0.23%. This significantly enhances scalability and energy efficiency, marking the first high-efficiency hardware acceleration of large-scale KANs.

Technology Category

Application Category

📝 Abstract
Recent developments have introduced Kolmogorov-Arnold Networks (KAN), an innovative architectural paradigm capable of replicating conventional deep neural network (DNN) capabilities while utilizing significantly reduced parameter counts through the employment of parameterized B-spline functions with trainable coefficients. Nevertheless, the B-spline functional components inherent to KAN architectures introduce distinct hardware acceleration complexities. While B-spline function evaluation can be accomplished through look-up table (LUT) implementations that directly encode functional mappings, thus minimizing computational overhead, such approaches continue to demand considerable circuit infrastructure, including LUTs, multiplexers, decoders, and related components. This work presents an algorithm-hardware co-design approach for KAN acceleration. At the algorithmic level, techniques include Alignment-Symmetry and PowerGap KAN hardware aware quantization, KAN sparsity aware mapping strategy, and circuit-level techniques include N:1 Time Modulation Dynamic Voltage input generator with analog-compute-in-memory (ACIM) circuits. This work conducts evaluations on large-scale KAN networks to validate the proposed methodologies. Non-ideality factors, including partial sum deviations from process variations, have been evaluated with statistics measured from the TSMC 22nm RRAM-ACIM prototype chips. Utilizing optimally determined KAN hyperparameters in conjunction with circuit optimizations fabricated at the 22nm technology node, despite the parameter count for large-scale tasks in this work increasing by 500Kx to 807Kx compared to tiny-scale tasks in previous work, the area overhead increases by only 28Kx to 41Kx, with power consumption rising by merely 51x to 94x, while accuracy degradation remains minimal at 0.11% to 0.23%, demonstrating the scaling potential of our proposed architecture.
Problem

Research questions and friction points this paper is trying to address.

Hardware acceleration challenges in Kolmogorov-Arnold Networks
Optimizing B-spline function evaluation with circuit efficiency
Scaling KAN architectures for large systems with minimal overhead
Innovation

Methods, ideas, or system contributions that make the work stand out.

Algorithm-hardware co-design for KAN acceleration
ACIM circuits with dynamic voltage input generator
Hardware-aware quantization and sparsity mapping
🔎 Similar Papers
No similar papers found.
W
Wei-Hsing Huang
School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA 30332 USA
J
Jianwei Jia
School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA 30332 USA
Y
Yuyao Kong
School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA 30332 USA
Faaiq Waqar
Faaiq Waqar
Ph.D. Student at the Georgia Institute of Technology
Integrated CircuitsAmorphous Oxide TransistorsAI HardwareNanotechnologyNanoporous Materials
T
Tai-Hao Wen
Department of Electrical Engineering, National Tsing Hua University (NTHU), Hsinchu 30013, Taiwan
Meng-Fan Chang
Meng-Fan Chang
National Tsing Hua University, Taiwan
Memory Circuit DesignsCompute-in-MemoryAI ChipsMemristor-Based Neuromorphic Circuitsetc.
Shimeng Yu
Shimeng Yu
Georgia Institute of Technology, Dean's Professor
Non-volatile MemoryRRAMFerroelectric MemoriesIn-Memory ComputingAI Hardware