A Low-Resource Speech-Driven NLP Pipeline for Sinhala Dyslexia Assistance

📅 2025-10-06

📈 Citations: 0

✨ Influential: 0

career value

165K/year

🤖 AI Summary

To address the lack of assistive tools for adult dyslexic readers in low-resource Sinhala-speaking environments, this work proposes the first end-to-end speech-driven NLP assistance system: Whisper performs speech-to-text transcription; SinBERT detects spelling errors; a fused mT5–Mistral model generates corrected text; and gTTS delivers spoken feedback—establishing a closed “listen–recognize–correct–read” loop. This is the first multimodal dyslexia intervention pipeline designed specifically for non-English, low-resource languages, significantly enhancing linguistic accessibility. Evaluated under data-scarce conditions, the system achieves 0.66 transcription accuracy, 0.70 error-correction accuracy, and 0.65 end-to-end overall accuracy, validating its feasibility. The work fills a critical gap in inclusive NLP research for Sinhala and establishes a reusable, modular paradigm for dyslexia support technologies in low-resource languages.

Technology Category

Application Category

📝 Abstract

Dyslexia in adults remains an under-researched and under-served area, particularly in non-English-speaking contexts, despite its significant impact on personal and professional lives. This work addresses that gap by focusing on Sinhala, a low-resource language with limited tools for linguistic accessibility. We present an assistive system explicitly designed for Sinhala-speaking adults with dyslexia. The system integrates Whisper for speech-to-text conversion, SinBERT, an open-sourced fine-tuned BERT model trained for Sinhala to identify common dyslexic errors, and a combined mT5 and Mistral-based model to generate corrected text. Finally, the output is converted back to speech using gTTS, creating a complete multimodal feedback loop. Despite the challenges posed by limited Sinhala-language datasets, the system achieves 0.66 transcription accuracy and 0.7 correction accuracy with 0.65 overall system accuracy. These results demonstrate both the feasibility and effectiveness of the approach. Ultimately, this work highlights the importance of inclusive Natural Language Processing (NLP) technologies in underrepresented languages and showcases a practical

Problem

Research questions and friction points this paper is trying to address.

Developing speech-driven NLP assistance for Sinhala dyslexic adults

Addressing limited linguistic accessibility tools for low-resource languages

Creating multimodal dyslexia support with transcription and correction capabilities

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Whisper for Sinhala speech-to-text conversion

Employs SinBERT to detect dyslexic errors in text

Combines mT5 and Mistral models for text correction

🔎 Similar Papers

Survey on Publicly Available Sinhala Natural Language Processing Tools and Research