A Low-Resource Speech-Driven NLP Pipeline for Sinhala Dyslexia Assistance

📅 2025-10-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the lack of assistive tools for adult dyslexic readers in low-resource Sinhala-speaking environments, this work proposes the first end-to-end speech-driven NLP assistance system: Whisper performs speech-to-text transcription; SinBERT detects spelling errors; a fused mT5–Mistral model generates corrected text; and gTTS delivers spoken feedback—establishing a closed “listen–recognize–correct–read” loop. This is the first multimodal dyslexia intervention pipeline designed specifically for non-English, low-resource languages, significantly enhancing linguistic accessibility. Evaluated under data-scarce conditions, the system achieves 0.66 transcription accuracy, 0.70 error-correction accuracy, and 0.65 end-to-end overall accuracy, validating its feasibility. The work fills a critical gap in inclusive NLP research for Sinhala and establishes a reusable, modular paradigm for dyslexia support technologies in low-resource languages.

Technology Category

Application Category

📝 Abstract
Dyslexia in adults remains an under-researched and under-served area, particularly in non-English-speaking contexts, despite its significant impact on personal and professional lives. This work addresses that gap by focusing on Sinhala, a low-resource language with limited tools for linguistic accessibility. We present an assistive system explicitly designed for Sinhala-speaking adults with dyslexia. The system integrates Whisper for speech-to-text conversion, SinBERT, an open-sourced fine-tuned BERT model trained for Sinhala to identify common dyslexic errors, and a combined mT5 and Mistral-based model to generate corrected text. Finally, the output is converted back to speech using gTTS, creating a complete multimodal feedback loop. Despite the challenges posed by limited Sinhala-language datasets, the system achieves 0.66 transcription accuracy and 0.7 correction accuracy with 0.65 overall system accuracy. These results demonstrate both the feasibility and effectiveness of the approach. Ultimately, this work highlights the importance of inclusive Natural Language Processing (NLP) technologies in underrepresented languages and showcases a practical
Problem

Research questions and friction points this paper is trying to address.

Developing speech-driven NLP assistance for Sinhala dyslexic adults
Addressing limited linguistic accessibility tools for low-resource languages
Creating multimodal dyslexia support with transcription and correction capabilities
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Whisper for Sinhala speech-to-text conversion
Employs SinBERT to detect dyslexic errors in text
Combines mT5 and Mistral models for text correction
🔎 Similar Papers
No similar papers found.
P
Peshala Perera
Informatics Institute of Technology, 57, Ramakrishna Road, Colombo 06, Sri Lanka
Deshan Sumanathilaka
Deshan Sumanathilaka
PhD Candidate at Swansea University
NLPMachine Translation and TransliterationMLWSD