Effective Integration of KAN for Keyword Spotting

📅 2024-09-13
🏛️ arXiv.org
📈 Citations: 2
Influential: 0
📄 PDF
🤖 AI Summary
To address the performance limitations of keyword spotting (KWS) on resource-constrained smart devices, this paper introduces the Kolmogorov–Arnold Network (KAN) to speech-based KWS for the first time, proposing a synergistic modeling framework that integrates KAN with a lightweight 1D CNN. The method leverages KAN’s superior expressivity for low-dimensional, high-level semantic features while exploiting CNN’s efficiency in capturing local time-frequency patterns. We design multiple learnable ensemble strategies and perform end-to-end training using standard acoustic features—e.g., MFCCs and log-Mel spectrograms. Evaluated on mainstream KWS benchmarks (e.g., Google Speech Commands), the proposed model significantly outperforms pure CNN baselines, achieving absolute accuracy gains of 2.3–4.1% and demonstrating enhanced robustness. This work validates KAN’s effectiveness and potential for lightweight, temporal speech modeling in edge-deployable KWS systems.

Technology Category

Application Category

📝 Abstract
Keyword spotting (KWS) is an important speech processing component for smart devices with voice assistance capability. In this paper, we investigate if Kolmogorov-Arnold Networks (KAN) can be used to enhance the performance of KWS. We explore various approaches to integrate KAN for a model architecture based on 1D Convolutional Neural Networks (CNN). We find that KAN is effective at modeling high-level features in lower-dimensional spaces, resulting in improved KWS performance when integrated appropriately. The findings shed light on understanding KAN for speech processing tasks and on other modalities for future researchers.
Problem

Research questions and friction points this paper is trying to address.

Keyword Recognition
Smart Devices
Speech Assistants Performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Kolmogorov-Arnold Network
Keyword Spotting
1D Convolutional Neural Network
🔎 Similar Papers
No similar papers found.
Anfeng Xu
Anfeng Xu
University of Southern California
Speech ProcessingMultimodal AILLMDeep Learning
B
Biqiao Zhang
Meta AI, USA
S
Shuyu Kong
Meta AI, USA
Y
Yiteng Huang
Meta AI, USA
Zhaojun Yang
Zhaojun Yang
Research Scientist, Facebook
Affective computingmachine learningmultimodal modelingspoken dialog system
S
Sangeeta Srivastava
Meta AI, USA
M
Ming Sun
Meta AI, USA