🤖 AI Summary
Existing ASR/TTS systems exhibit poor adaptability to speech disorders (e.g., dysarthria, stuttering, aphasia), suffer from high latency in edge deployment, and lack personalization support. To address these challenges, this paper proposes SpeechAgent—a unified, end-to-end mobile speech assistance architecture. SpeechAgent integrates a lightweight large language model (LLM) for semantic reasoning with real-time, on-device speech processing modules, enabling adaptive recognition and natural speech synthesis across diverse speech disorders. Through model compression and edge-optimized inference, it achieves low-latency (<300 ms) and high-accuracy operation directly on mobile devices. Evaluated on a real-world speech disorder dataset, SpeechAgent improves ASR word accuracy by 12.6% over strong baselines and attains a TTS Mean Opinion Score (MOS) of 4.1, significantly outperforming prior approaches. The system demonstrates practical viability for daily assistive communication.
📝 Abstract
Speech is essential for human communication, yet millions of people face impairments such as dysarthria, stuttering, and aphasia conditions that often lead to social isolation and reduced participation. Despite recent progress in automatic speech recognition (ASR) and text-to-speech (TTS) technologies, accessible web and mobile infrastructures for users with impaired speech remain limited, hindering the practical adoption of these advances in daily communication. To bridge this gap, we present SpeechAgent, a mobile SpeechAgent designed to facilitate people with speech impairments in everyday communication. The system integrates large language model (LLM)- driven reasoning with advanced speech processing modules, providing adaptive support tailored to diverse impairment types. To ensure real-world practicality, we develop a structured deployment pipeline that enables real-time speech processing on mobile and edge devices, achieving imperceptible latency while maintaining high accuracy and speech quality. Evaluation on real-world impaired speech datasets and edge-device latency profiling confirms that SpeechAgent delivers both effective and user-friendly performance, demonstrating its feasibility for personalized, day-to-day assistive communication.