🤖 AI Summary
ASR transcription errors severely degrade query accuracy in voice search. To address this, we propose a contextualized token discrimination method that leverages BERT to model token-level contextual representations, integrates semantic information via a composition layer for enhanced representation learning, and identifies and corrects erroneous tokens by measuring discrepancies between original and contextualized representations. Our key contributions include: (i) the novel introduction of a context-aware token representation discrepancy mechanism for error detection; and (ii) the first publicly released ASR error benchmark dataset for voice search correction—ASR-QC—designed to standardize evaluation in this domain. Extensive experiments demonstrate that our approach significantly outperforms existing state-of-the-art models in accuracy, recall, and F1-score, validating the effectiveness of the contextualized discrimination paradigm. This work provides both a principled framework for audio query correction and a robust, standardized evaluation foundation for future research.
📝 Abstract
Query spelling correction is an important function of modern search engines since it effectively helps users express their intentions clearly. With the growing popularity of speech search driven by Automated Speech Recognition (ASR) systems, this paper introduces a novel method named Contextualized Token Discrimination (CTD) to conduct effective speech query correction. In CTD, we first employ BERT to generate token-level contextualized representations and then construct a composition layer to enhance semantic information. Finally, we produce the correct query according to the aggregated token representation, correcting the incorrect tokens by comparing the original token representations and the contextualized representations. Extensive experiments demonstrate the superior performance of our proposed method across all metrics, and we further present a new benchmark dataset with erroneous ASR transcriptions to offer comprehensive evaluations for audio query correction.