Quantization-Based Score Calibration for Few-Shot Keyword Spotting with Dynamic Time Warping in Noisy Environments

📅 2025-10-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Few-shot keyword spotting (KWS) systems suffer from poor open-set generalization under noisy conditions due to reliance on validation-set-dependent detection thresholds. To address this, we propose a Dynamic Time Warping (DTW)-based quantized score calibration method. Our approach normalizes DTW matching scores by jointly performing embedding vector quantization and modeling the quantization error as a prior, thereby decoupling threshold selection from model-specific performance and substantially reducing dependence on validation-set tuning. The key innovation lies in incorporating quantization error as an explicit prior in score calibration, enhancing robustness across diverse noise conditions. Experiments on the KWS-DailyTalk dataset demonstrate that our method improves F1-score by up to 12.3% under high-noise radio channels, while enabling threshold reuse across acoustic environments—significantly improving the practical deployability of few-shot KWS systems.

Technology Category

Application Category

📝 Abstract
Detecting occurrences of keywords with keyword spotting (KWS) systems requires thresholding continuous detection scores. Selecting appropriate thresholds is a non-trivial task, typically relying on optimizing the performance on a validation dataset. However, such greedy threshold selection often leads to suboptimal performance on unseen data, particularly in varying or noisy acoustic environments or few-shot settings. In this work, we investigate detection threshold estimation for template-based open-set few-shot KWS using dynamic time warping on noisy speech data. To mitigate the performance degradation caused by suboptimal thresholds, we propose a score calibration approach consisting of two different steps: quantizing embeddings and normalizing detection scores using the quantization error prior to thresholding. Experiments on KWS-DailyTalk with simulated high frequency radio channels show that the proposed calibration approach simplifies the choice of detection thresholds and significantly improves the resulting performance.
Problem

Research questions and friction points this paper is trying to address.

Optimizing detection thresholds for few-shot keyword spotting systems
Addressing performance degradation in noisy acoustic environments
Calibrating scores to improve threshold selection robustness
Innovation

Methods, ideas, or system contributions that make the work stand out.

Quantizing embeddings for score calibration
Normalizing detection scores using quantization error
Improving keyword spotting in noisy few-shot settings
🔎 Similar Papers
No similar papers found.
K
Kevin Wilkinghoff
Department of Electronic Systems, Aalborg University, Denmark
A
Alessia Cornaggia-Urrigshardt
Fraunhofer FKIE, Wachtberg, Germany
Zheng-Hua Tan
Zheng-Hua Tan
Professor of Machine Learning and Speech Processing, Aalborg University and Pioneer Centre for AI
Machine learningdeep learningself-supervised learningspeech processingmultimodal.