Multiple-Instance, Cascaded Classification for Keyword Spotting in Narrow-Band Audio

📅 2017-11-21
🏛️ arXiv.org
📈 Citations: 4
Influential: 0
📄 PDF
🤖 AI Summary
Keyword spotting (KWS) for 8 kHz narrowband audio under non-IID conditions poses unique challenges—distinct from mainstream KWS setups—due to bandwidth limitations, statistical data distribution shifts across clients, and stringent edge-device resource constraints. Method: We propose a lightweight cascaded multi-instance learning (MIL) framework tailored for edge deployment. It is the first to integrate MIL with a cascaded deep neural network (DNN), incorporating an early-exit mechanism to mitigate class imbalance and reduce computational overhead. Robustness is enhanced by fusing Mel spectrograms, MFCCs, and periodicity features. Results: Under strict deployment constraints, our system achieves a 6% false rejection rate (FRR) at 0.75 false alarms per hour (FAR), significantly outperforming existing narrowband KWS approaches while maintaining high accuracy and ultra-low power consumption.
📝 Abstract
We propose using cascaded classifiers for a keyword spotting (KWS) task on narrow-band (NB), 8kHz audio acquired in non-IID environments --- a more challenging task than most state-of-the-art KWS systems face. We present a model that incorporates Deep Neural Networks (DNNs), cascading, multiple-feature representations, and multiple-instance learning. The cascaded classifiers handle the task's class imbalance and reduce power consumption on computationally-constrained devices via early termination. The KWS system achieves a false negative rate of 6% at an hourly false positive rate of 0.75
Problem

Research questions and friction points this paper is trying to address.

Keyword spotting in narrow-band 8kHz audio
Handling class imbalance and power constraints
Achieving low false negative and positive rates
Innovation

Methods, ideas, or system contributions that make the work stand out.

Cascaded classifiers handle class imbalance
Multiple-feature representations enhance accuracy
Early termination reduces power consumption
🔎 Similar Papers
No similar papers found.
A
Ahmad Abdulkader
Voicera
Kareem Nassar
Kareem Nassar
Voicera
M
Mohamed Mahmoud
Voicera
Daniel Galvez
Daniel Galvez
Cornell University
Computer ScienceSpeech RecognitionMachine Learning
C
Chetan Patil
Voicera