🤖 AI Summary
Battery- and size-constrained wireless hearing aids face a fundamental trade-off between high computational demands and strict low-power requirements for on-device real-time speech AI processing. To address this, we propose a fully on-device speech AI system featuring: (1) the first ear-worn platform integrating a programmable speech AI accelerator; (2) a low-latency dual-path CNN-RNN hybrid architecture enabling frame-level streaming inference at 6 ms frame shift; and (3) a hardware-software co-designed mixed-precision quantization framework with quantization-aware training. The system achieves real-time inference latency of 5.54 ms per frame and power consumption of only 71.6 mW. In a user study with 28 participants, it significantly outperforms existing on-device solutions in speech quality (PESQ) and noise suppression performance, thereby overcoming the deployment bottleneck for streaming deep learning on ultra-compact wearable devices.
📝 Abstract
The conventional wisdom has been that designing ultra-compact, battery-constrained wireless hearables with on-device speech AI models is challenging due to the high computational demands of streaming deep learning models. Speech AI models require continuous, real-time audio processing, imposing strict computational and I/O constraints. We present NeuralAids, a fully on-device speech AI system for wireless hearables, enabling real-time speech enhancement and denoising on compact, battery-constrained devices. Our system bridges the gap between state-of-the-art deep learning for speech enhancement and low-power AI hardware by making three key technical contributions: 1) a wireless hearable platform integrating a speech AI accelerator for efficient on-device streaming inference, 2) an optimized dual-path neural network designed for low-latency, high-quality speech enhancement, and 3) a hardware-software co-design that uses mixed-precision quantization and quantization-aware training to achieve real-time performance under strict power constraints. Our system processes 6 ms audio chunks in real-time, achieving an inference time of 5.54 ms while consuming 71.6 mW. In real-world evaluations, including a user study with 28 participants, our system outperforms prior on-device models in speech quality and noise suppression, paving the way for next-generation intelligent wireless hearables that can enhance hearing entirely on-device.