AppTek Call-Center Dialogues: A Multi-Accent Long-Form Benchmark for English ASR

📅 2026-04-30

📈 Citations: 0

✨ Influential: 0

career value

134K/year

🤖 AI Summary

This study addresses the limitations of existing English automatic speech recognition (ASR) evaluation benchmarks, which predominantly rely on read-style short utterances and lack authentic conversational contexts, extended interactions, and explicit accent annotations—hindering robust assessment of ASR performance across diverse accents. To bridge this gap, the authors introduce a novel dataset comprising spontaneous, role-playing customer service dialogues spanning 14 English accents and 16 service scenarios. This benchmark offers the first evaluation resource featuring multi-accent, long-form, and naturally conversational speech, substantially reducing overlap with prevalent pretraining corpora. Systematic evaluations of multiple open-source ASR systems under varying segmentation strategies reveal significant performance disparities across accents and segmentation methods, demonstrating that results from standard American English benchmarks do not generalize effectively to other accents.

📝 Abstract

Evaluating English ASR systems for conversational AI applications remains difficult, as many publicly available corpora are either pre-segmented into short segments, consist of read or prepared speech, or lack explicit dialect annotations to evaluate robustness for a diverse user base. This work presents the AppTek Call-Center Dialogues corpus, a collection of spontaneous, role-played agent-customer conversations spanning fourteen English accents covering sixteen service-oriented scenarios. The dataset was commissioned specifically for evaluation and none of the audio or text was publicly available prior to release, reducing the risk of overlap with existing large-scale pretraining corpora. We benchmark a set of open-source ASR systems under different segmentation approaches. Results show substantial variation across accents and segmentation methods, indicating that good performance on general American English benchmarks does not necessarily generalize to other accents.

Problem

Research questions and friction points this paper is trying to address.

English ASR

accent robustness

conversational speech

dialect annotation

evaluation benchmark

Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-accent

long-form dialogue

spontaneous speech