Variational Low-Rank Adaptation for Personalized Impaired Speech Recognition

📅 2025-09-23

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

Non-standard speech—arising from conditions such as cerebral palsy or post-stroke dysarthria—severely degrades the performance of mainstream ASR systems (e.g., Whisper), primarily due to scarce training data, high acoustic variability, and prohibitive annotation costs. To address this, we propose a personalized ASR framework based on Variational Low-Rank Adaptation (VLora), which integrates Bayesian inference with parameter-efficient fine-tuning to achieve dual efficiency in data and annotation usage under few-shot and cross-lingual settings. Implemented atop Whisper, our method fine-tunes only a small set of low-rank parameters while explicitly modeling predictive uncertainty. This enhances adaptation to speaker-specific acoustic distributions with minimal supervision. Experiments on the English UA-Speech and German BF-Sprache datasets demonstrate substantial WER reductions compared to baseline fine-tuning approaches, significantly lowering reliance on high-quality labeled data. Our approach provides a scalable, low-resource solution for inclusive, personalized speech recognition.

Technology Category

Application Category

📝 Abstract

Speech impairments resulting from congenital disorders, such as cerebral palsy, down syndrome, or apert syndrome, as well as acquired brain injuries due to stroke, traumatic accidents, or tumors, present major challenges to automatic speech recognition (ASR) systems. Despite recent advancements, state-of-the-art ASR models like Whisper still struggle with non-normative speech due to limited training data availability and high acoustic variability. Moreover, collecting and annotating non-normative speech is burdensome: speaking is effortful for many affected individuals, while laborious annotation often requires caregivers familiar with the speaker. This work introduces a novel ASR personalization method based on Bayesian Low-rank Adaptation for data-efficient fine-tuning. We validate our method on the English UA-Speech dataset and a newly collected German speech dataset, BF-Sprache, from a child with structural speech impairment. The dataset and approach are designed to reflect the challenges of low-resource settings that include individuals with speech impairments. Our method significantly improves ASR accuracy for impaired speech while maintaining data and annotation efficiency, offering a practical path toward inclusive ASR.

Problem

Research questions and friction points this paper is trying to address.

Improving ASR accuracy for impaired speech from congenital disorders and brain injuries

Addressing data scarcity and acoustic variability in non-normative speech recognition

Developing data-efficient personalization methods for low-resource impaired speech settings

Innovation

Methods, ideas, or system contributions that make the work stand out.

Bayesian Low-rank Adaptation for fine-tuning

Personalized ASR for impaired speech

Data-efficient method for low-resource settings

🔎 Similar Papers

Personalized Speech Recognition for Children with Test-Time Adaptation