The Interspeech 2025 Speech Accessibility Project Challenge

📅 2025-07-29

📈 Citations: 0

✨ Influential: 0

career value

219K/year

🤖 AI Summary

To address the poor automatic speech recognition (ASR) performance for individuals with speech impairments and the scarcity of high-quality, publicly available datasets, this work introduces the largest open-source dysarthric speech dataset to date—comprising over 400 hours of audio from 500+ speakers. We establish the first dual-metric evaluation benchmark jointly optimizing word error rate (WER) and semantic similarity score (SemScore), implemented via a standardized remote evaluation framework on EvalAI. Using Whisper-large-v2 as the baseline, we foster end-to-end ASR model development for dysarthric speech. Among participating teams, 12 achieved lower WER and 17 attained higher SemScore than the baseline. The best-performing system achieved WER=8.11% and SemScore=88.44%, substantially advancing the state of the art. This work delivers a foundational triad—data, evaluation benchmark, and methodological guidance—for accessible speech recognition.

Technology Category

Application Category

📝 Abstract

While the last decade has witnessed significant advancements in Automatic Speech Recognition (ASR) systems, performance of these systems for individuals with speech disabilities remains inadequate, partly due to limited public training data. To bridge this gap, the 2025 Interspeech Speech Accessibility Project (SAP) Challenge was launched, utilizing over 400 hours of SAP data collected and transcribed from more than 500 individuals with diverse speech disabilities. Hosted on EvalAI and leveraging the remote evaluation pipeline, the SAP Challenge evaluates submissions based on Word Error Rate and Semantic Score. Consequently, 12 out of 22 valid teams outperformed the whisper-large-v2 baseline in terms of WER, while 17 teams surpassed the baseline on SemScore. Notably, the top team achieved the lowest WER of 8.11%, and the highest SemScore of 88.44% at the same time, setting new benchmarks for future ASR systems in recognizing impaired speech.

Problem

Research questions and friction points this paper is trying to address.

Improving ASR for speech disabilities with limited data

Evaluating ASR performance using Word Error Rate and Semantic Score

Setting new benchmarks for impaired speech recognition

Innovation

Methods, ideas, or system contributions that make the work stand out.

Utilized 400+ hours of diverse speech disability data

Evaluated submissions using WER and SemScore metrics

Achieved record low WER and high SemScore benchmarks

🔎 Similar Papers

No similar papers found.

TikTok

Seattle, Washington

Software Engineer Intern (AI Model Optimization) - 2026 Summer (BS/MS)

TikTok

San Jose, California

Research Scientist Intern, Multimodal AI (PhD)