TsetlinKWS: A 65nm 16.58uW, 0.63mm2 State-Driven Convolutional Tsetlin Machine-Based Accelerator For Keyword Spotting

📅 2025-10-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the accuracy and energy-efficiency bottlenecks of Tsetlin Machines (TMs) in ultra-low-power keyword spotting, this work proposes the first algorithm-hardware co-design framework for convolutional Tsetlin Machines. Methodologically: (i) MFSC-SF acoustic features and spectral convolution are introduced, boosting accuracy to 87.35%; (ii) an OG-BCSR sparse storage algorithm compresses model size by 9.84×; (iii) a state-driven hardware architecture is developed to jointly exploit data reuse and structural sparsity. Implemented in 65 nm CMOS, the chip occupies only 0.63 mm² core area, achieves an inference power of 16.58 μW, and requires merely 907k logic operations per inference—yielding a 10× energy-efficiency improvement over the state-of-the-art accelerators. This work marks the first systematic integration of high accuracy, high sparsity, and ultra-low-power speech inference in a single TM-based solution.

Technology Category

Application Category

📝 Abstract
The Tsetlin Machine (TM) has recently attracted attention as a low-power alternative to neural networks due to its simple and interpretable inference mechanisms. However, its performance on speech-related tasks remains limited. This paper proposes TsetlinKWS, the first algorithm-hardware co-design framework for the Convolutional Tsetlin Machine (CTM) on the 12-keyword spotting task. Firstly, we introduce a novel Mel-Frequency Spectral Coefficient and Spectral Flux (MFSC-SF) feature extraction scheme together with spectral convolution, enabling the CTM to reach its first-ever competitive accuracy of 87.35% on the 12-keyword spotting task. Secondly, we develop an Optimized Grouped Block-Compressed Sparse Row (OG-BCSR) algorithm that achieves a remarkable 9.84$ imes$ reduction in model size, significantly improving the storage efficiency on CTMs. Finally, we propose a state-driven architecture tailored for the CTM, which simultaneously exploits data reuse and sparsity to achieve high energy efficiency. The full system is evaluated in 65 nm process technology, consuming 16.58 $μ$W at 0.7 V with a compact 0.63 mm$^2$ core area. TsetlinKWS requires only 907k logic operations per inference, representing a 10$ imes$ reduction compared to the state-of-the-art KWS accelerators, positioning the CTM as a highly-efficient candidate for ultra-low-power speech applications.
Problem

Research questions and friction points this paper is trying to address.

Achieving competitive keyword spotting accuracy with Tsetlin Machine
Reducing model size and improving storage efficiency for CTM
Designing energy-efficient hardware accelerator for ultra-low-power speech applications
Innovation

Methods, ideas, or system contributions that make the work stand out.

Novel MFSC-SF feature extraction with spectral convolution
Optimized Grouped Block-Compressed Sparse Row algorithm
State-driven architecture exploiting data reuse and sparsity
🔎 Similar Papers
No similar papers found.