SQuTR: A Robustness Benchmark for Spoken Query to Text Retrieval under Acoustic Noise

📅 2026-02-13
📈 Citations: 0
Influential: 0
📄 PDF

Technology Category

Application Category

📝 Abstract
Spoken query retrieval is an important interaction mode in modern information retrieval. However, existing evaluation datasets are often limited to simple queries under constrained noise conditions, making them inadequate for assessing the robustness of spoken query retrieval systems under complex acoustic perturbations. To address this limitation, we present SQuTR, a robustness benchmark for spoken query retrieval that includes a large-scale dataset and a unified evaluation protocol. SQuTR aggregates 37,317 unique queries from six commonly used English and Chinese text retrieval datasets, spanning multiple domains and diverse query types. We synthesize speech using voice profiles from 200 real speakers and mix 17 categories of real-world environmental noise under controlled SNR levels, enabling reproducible robustness evaluation from quiet to highly noisy conditions. Under the unified protocol, we conduct large-scale evaluations on representative cascaded and end-to-end retrieval systems. Experimental results show that retrieval performance decreases as noise increases, with substantially different drops across systems. Even large-scale retrieval models struggle under extreme noise, indicating that robustness remains a critical bottleneck. Overall, SQuTR provides a reproducible testbed for benchmarking and diagnostic analysis, and facilitates future research on robustness in spoken query to text retrieval.
Problem

Research questions and friction points this paper is trying to address.

spoken query retrieval
robustness
acoustic noise
evaluation benchmark
speech-to-text retrieval
Innovation

Methods, ideas, or system contributions that make the work stand out.

spoken query retrieval
robustness benchmark
acoustic noise
speech synthesis
reproducible evaluation
🔎 Similar Papers
No similar papers found.
Y
Yuejie Li
Huazhong University of Science and Technology
K
Ke Yang
The University of Hong Kong
Y
Yueying Hua
Soochow University
Berlin Chen
Berlin Chen
Professor of Computer Science and Information Engineering, National Taiwan Normal University
speech and natural language processingcomputer-assisted language learningmachine learning
J
Jianhao Nie
Wuhan University
Y
Yueping He
Tsinghua University
Caixin Kang
Caixin Kang
The University of Tokyo
Computer VisionTrustworthy AIAutonomous DrivingGenerative Models