π€ AI Summary
Multi-channel keyword spotting (KWS) for edge devices faces a fundamental trade-off among computational overhead, energy consumption, and accuracy. To address this, we propose a hardware-friendly feature extraction frontend featuring three key innovations: (1) semi-overlapping infinite-impulse-response (IIR) framing to preserve temporal integrity; (2) sparsity-aware data compression; and (3) a dynamic parallel filter scheduling mechanism. Our design integrates IIR filter banks, frame skipping, stride-based filtering, parameterized filter clusters, and priority-based scheduling, and is validated on FPGA using a 45 nm process. Experimental results demonstrate a 62.73% reduction in average computational workload while enabling real-time processing of up to 32 channels. When integrated with DS-CNN, accuracy degrades by less than 1%βachieving 96.22%. Furthermore, under a 15-filter configuration, the system attains optimal energy efficiency for 25-channel operation.
π Abstract
Multi-channel keyword spotting (KWS) has become crucial for voice-based applications in edge environments. However, its substantial computational and energy requirements pose significant challenges. We introduce ASAP-FE (Agile Sparsity-Aware Parallelized-Feature Extractor), a hardware-oriented front-end designed to address these challenges. Our framework incorporates three key innovations: (1) Half-overlapped Infinite Impulse Response (IIR) Framing: This reduces redundant data by approximately 25% while maintaining essential phoneme transition cues. (2) Sparsity-aware Data Reduction: We exploit frame-level sparsity to achieve an additional 50% data reduction by combining frame skipping with stride-based filtering. (3) Dynamic Parallel Processing: We introduce a parameterizable filter cluster and a priority-based scheduling algorithm that allows parallel execution of IIR filtering tasks, reducing latency and optimizing energy efficiency. ASAP-FE is implemented with various filter cluster sizes on edge processors, with functionality verified on FPGA prototypes and designs synthesized at 45 nm. Experimental results using TC-ResNet8, DS-CNN, and KWT-1 demonstrate that ASAP-FE reduces the average workload by 62.73% while supporting real-time processing for up to 32 channels. Compared to a conventional fully overlapped baseline, ASAP-FE achieves less than a 1% accuracy drop (e.g., 96.22% vs. 97.13% for DS-CNN), which is well within acceptable limits for edge AI. By adjusting the number of filter modules, our design optimizes the trade-off between performance and energy, with 15 parallel filters providing optimal performance for up to 25 channels. Overall, ASAP-FE offers a practical and efficient solution for multi-channel KWS on energy-constrained edge devices.