🤖 AI Summary
To address the accuracy and energy-efficiency bottlenecks of Tsetlin Machines (TMs) in ultra-low-power keyword spotting, this work proposes the first algorithm-hardware co-design framework for convolutional Tsetlin Machines. Methodologically: (i) MFSC-SF acoustic features and spectral convolution are introduced, boosting accuracy to 87.35%; (ii) an OG-BCSR sparse storage algorithm compresses model size by 9.84×; (iii) a state-driven hardware architecture is developed to jointly exploit data reuse and structural sparsity. Implemented in 65 nm CMOS, the chip occupies only 0.63 mm² core area, achieves an inference power of 16.58 μW, and requires merely 907k logic operations per inference—yielding a 10× energy-efficiency improvement over the state-of-the-art accelerators. This work marks the first systematic integration of high accuracy, high sparsity, and ultra-low-power speech inference in a single TM-based solution.
📝 Abstract
The Tsetlin Machine (TM) has recently attracted attention as a low-power alternative to neural networks due to its simple and interpretable inference mechanisms. However, its performance on speech-related tasks remains limited. This paper proposes TsetlinKWS, the first algorithm-hardware co-design framework for the Convolutional Tsetlin Machine (CTM) on the 12-keyword spotting task. Firstly, we introduce a novel Mel-Frequency Spectral Coefficient and Spectral Flux (MFSC-SF) feature extraction scheme together with spectral convolution, enabling the CTM to reach its first-ever competitive accuracy of 87.35% on the 12-keyword spotting task. Secondly, we develop an Optimized Grouped Block-Compressed Sparse Row (OG-BCSR) algorithm that achieves a remarkable 9.84$ imes$ reduction in model size, significantly improving the storage efficiency on CTMs. Finally, we propose a state-driven architecture tailored for the CTM, which simultaneously exploits data reuse and sparsity to achieve high energy efficiency. The full system is evaluated in 65 nm process technology, consuming 16.58 $μ$W at 0.7 V with a compact 0.63 mm$^2$ core area. TsetlinKWS requires only 907k logic operations per inference, representing a 10$ imes$ reduction compared to the state-of-the-art KWS accelerators, positioning the CTM as a highly-efficient candidate for ultra-low-power speech applications.