STEAMROLLER: A Multi-Agent System for Inclusive Automatic Speech Recognition for People who Stutter

📅 2026-01-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the poor performance of current automatic speech recognition (ASR) systems on disfluent speech, which excludes individuals with speech disorders from mainstream voice-based interaction. The authors propose a multi-stage, multi-agent real-time repair framework that integrates ASR transcription, collaborative multi-agent text refinement, and speech synthesis to substantially reduce word error rates while preserving semantic integrity. By iteratively optimizing transcriptions and incorporating stutter-specific fine-tuning strategies, the approach achieves significant performance gains on the FluencyBank dataset and high user satisfaction in human evaluations. This study establishes a new paradigm for developing inclusive speech AI systems that accommodate diverse speech patterns.

Technology Category

Application Category

📝 Abstract
People who stutter (PWS) face systemic exclusion in today's voice-driven society, where access to voice assistants, authentication systems, and remote work tools increasingly depends on fluent speech. Current automatic speech recognition (ASR) systems, trained predominantly on fluent speech, fail to serve millions of PWS worldwide. We present STEAMROLLER, a real time system that transforms stuttered speech into fluent output through a novel multi-stage, multi-agent AI pipeline. Our approach addresses three critical technical challenges: (1) the difficulty of direct speech to speech conversion for disfluent input, (2) semantic distortions introduced during ASR transcription of stuttered speech, and (3) latency constraints for real time communication. STEAMROLLER employs a three stage architecture comprising ASR transcription, multi-agent text repair, and speech synthesis, where our core innovation lies in a collaborative multi-agent framework that iteratively refines transcripts while preserving semantic intent. Experiments on the FluencyBank dataset and a user study demonstrates clear word error rate (WER) reduction and strong user satisfaction. Beyond immediate accessibility benefits, fine tuning ASR on STEAMROLLER repaired speech further yields additional WER improvements, creating a pathway toward inclusive AI ecosystems.
Problem

Research questions and friction points this paper is trying to address.

stuttering
automatic speech recognition
speech accessibility
disfluent speech
inclusive AI
Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-agent system
inclusive ASR
stuttered speech processing
text repair
real-time speech conversion
🔎 Similar Papers
No similar papers found.
Ziqi Xu
Ziqi Xu
Lecturer, School of Computing Technologies, RMIT University
Causal AIFairness
Yi Liu
Yi Liu
AI Research @ Quantstamp | PhD @ NTU | BEng @ SUSTech
AI AgentSoftware EngineeringLLM Security
Yuekang Li
Yuekang Li
Lecturer (Assistant Professor), University of New South Wales
Software EngineeringSoftware SecurityAI Red Teaming
L
Ling Shi
Nanyang Technological University, Singapore
K
Kailong Wang
Huazhong University of Science and Technology, China
Y
Yongxin Zhao
Shanghai Key Laboratory of Trustworthy Computing, East China Normal University, Shanghai, China