PAREDA: A Multi-Accent Speech Dataset of Natural Language Processing Research Discussions

πŸ“… 2026-05-18
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

186K/year
πŸ€– AI Summary
This study addresses the significant performance degradation of current automatic speech recognition (ASR) systems when processing real-world speech characterized by diverse accents, high spontaneity, and domain-specific terminology. To bridge this gap, the authors introduce the first multi-accent speech dataset tailored to academic discussion scenarios, featuring Australian, Indian, and Chinese English accents. The dataset comprises spoken summaries and question-answering interactions centered on natural language processing research papers, thereby combining technical vocabulary with conversational dynamics. Using this resource, the paper evaluates end-to-end ASR models under both zero-shot and fine-tuned settings. Results demonstrate that state-of-the-art models exhibit high word error rates in zero-shot conditions, whereas fine-tuning with PAREDA yields substantial improvements, underscoring the dataset’s critical role in advancing ASR robustness and accent inclusivity.
πŸ“ Abstract
While modern Automatic Speech Recognition (ASR) systems achieve high accuracy on benchmark corpora, their performance often degrades when there is real-world variability. This work focuses on variability arising due to accented, spontaneous, and domain-specific speech. In particular, we introduce PAper REading DAtaset (PAREDA), a first-of-its-kind multi-accent speech dataset consisting of discussions on academic Natural Language Processing (NLP) papers between speakers with Australian, Indian-English, and Chinese English accents. Each session elicits a spontaneous monologue (a summary of a paper's abstract) and a non-monologue (a question-and-answer session between participants), resulting in a corpus rich with technical jargon and conversational phenomena. We evaluate the performance of SOTA ASR models on PAREDA, analysing the impact of accent mixing and increased speech rate. Our results show that, in the zero-shot setting, models perform worse, confirming the dataset's challenging nature. However, fine-tuning on PAREDA significantly reduces the Word Error Rate (WER), demonstrating that our dataset captures linguistic characteristics often missing from existing corpora. PAREDA serves as a valuable new resource for building and evaluating more robust and inclusive ASR systems for specialised, real-world applications.
Problem

Research questions and friction points this paper is trying to address.

Automatic Speech Recognition
accent variability
spontaneous speech
domain-specific language
speech dataset
Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-accent speech
spontaneous speech
domain-specific ASR
PAREDA dataset
Word Error Rate (WER)