Contextualized Token Discrimination for Speech Search Query Correction

📅 2025-09-04

📈 Citations: 0

✨ Influential: 0

career value

172K/year

🤖 AI Summary

ASR transcription errors severely degrade query accuracy in voice search. To address this, we propose a contextualized token discrimination method that leverages BERT to model token-level contextual representations, integrates semantic information via a composition layer for enhanced representation learning, and identifies and corrects erroneous tokens by measuring discrepancies between original and contextualized representations. Our key contributions include: (i) the novel introduction of a context-aware token representation discrepancy mechanism for error detection; and (ii) the first publicly released ASR error benchmark dataset for voice search correction—ASR-QC—designed to standardize evaluation in this domain. Extensive experiments demonstrate that our approach significantly outperforms existing state-of-the-art models in accuracy, recall, and F1-score, validating the effectiveness of the contextualized discrimination paradigm. This work provides both a principled framework for audio query correction and a robust, standardized evaluation foundation for future research.

Technology Category

Application Category

📝 Abstract

Query spelling correction is an important function of modern search engines since it effectively helps users express their intentions clearly. With the growing popularity of speech search driven by Automated Speech Recognition (ASR) systems, this paper introduces a novel method named Contextualized Token Discrimination (CTD) to conduct effective speech query correction. In CTD, we first employ BERT to generate token-level contextualized representations and then construct a composition layer to enhance semantic information. Finally, we produce the correct query according to the aggregated token representation, correcting the incorrect tokens by comparing the original token representations and the contextualized representations. Extensive experiments demonstrate the superior performance of our proposed method across all metrics, and we further present a new benchmark dataset with erroneous ASR transcriptions to offer comprehensive evaluations for audio query correction.

Problem

Research questions and friction points this paper is trying to address.

Correcting spelling errors in speech search queries

Improving ASR transcription accuracy for search

Enhancing query semantics with contextual token discrimination

Innovation

Methods, ideas, or system contributions that make the work stand out.

BERT for token-level contextual representations

Composition layer enhancing semantic information

Correcting tokens by comparing contextual representations

🔎 Similar Papers

No similar papers found.