End-to-End Efficiency in Keyword Spotting: A System-Level Approach for Embedded Microcontrollers

📅 2025-09-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenges of deploying keyword spotting (KWS) on memory- and energy-constrained embedded microcontrollers (MCUs), this paper proposes an end-to-end co-optimization framework—first systematically integrating MFCC feature extraction, lightweight neural network architecture, and MCU hardware acceleration. We design TKWS, a dedicated model incorporating MobileNet variants, residual connections, and insights from DS-CNN and LiCoNet, fully deployed across STM32 N6/H7/U5 platforms. With only 14.4k parameters, TKWS achieves a 92.4% F1-score on standard benchmarks. On the N6 platform, leveraging its neural accelerator, it attains optimal energy-delay product while enabling low-latency inference under high-resolution MFCCs. Our key contribution lies in identifying and quantifying critical software-hardware co-design factors—beyond accuracy—that govern real-world KWS efficiency, establishing a reproducible, ultra-lightweight paradigm for edge KWS deployment.

Technology Category

Application Category

📝 Abstract
Keyword spotting (KWS) is a key enabling technology for hands-free interaction in embedded and IoT devices, where stringent memory and energy constraints challenge the deployment of AI-enabeld devices. In this work, we systematically evaluate and compare several state-of-the-art lightweight neural network architectures, including DS-CNN, LiCoNet, and TENet, alongside our proposed Typman-KWS (TKWS) architecture built upon MobileNet, specifically designed for efficient KWS on microcontroller units (MCUs). Unlike prior studies focused solely on model inference, our analysis encompasses the entire processing pipeline, from Mel-Frequency Cepstral Coefficient (MFCC) feature extraction to neural inference, and is benchmarked across three STM32 platforms (N6, H7, and U5). Our results show that TKWS with three residual blocks achieves up to 92.4% F1-score with only 14.4k parameters, reducing memory footprint without compromising the accuracy. Moreover, the N6 MCU with integrated neural acceleration achieves the best energy-delay product (EDP), enabling efficient, low-latency operation even with high-resolution features. Our findings highlight the model accuracy alone does not determine real-world effectiveness; rather, optimal keyword spotting deployments require careful consideration of feature extraction parameters and hardware-specific optimization.
Problem

Research questions and friction points this paper is trying to address.

Optimizing keyword spotting efficiency on microcontrollers
Balancing accuracy with memory and energy constraints
Evaluating full processing pipeline beyond model inference
Innovation

Methods, ideas, or system contributions that make the work stand out.

TKWS architecture with MobileNet for MCUs
Full pipeline analysis from MFCC to inference
Hardware-specific optimization for energy efficiency
🔎 Similar Papers
No similar papers found.
Pietro Bartoli
Pietro Bartoli
Politecnico di Milano
Machine LearningTinyMLMicrocontrollerWearableSmart Eyewear
T
Tommaso Bondini
Dept. of Electronics, Information and Bioengineering, Politecnico di Milano, Milan, Italy
C
Christian Veronesi
Dept. of Electronics, Information and Bioengineering, Politecnico di Milano, Milan, Italy
A
Andrea Giudici
Dept. of Electronics, Information and Bioengineering, Politecnico di Milano, Milan, Italy
N
Niccolò Antonello
Smart Eyewear Laboratory, EssilorLuxottica, Milan, Italy
Franco Zappa
Franco Zappa
Politecnico di Milano
Single Photon Avalanche Diode (SPAD)single photon detectionSPAD imagersmicroelectronic Instrumentation