Variational Low-Rank Adaptation for Personalized Impaired Speech Recognition

📅 2025-09-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Non-standard speech—arising from conditions such as cerebral palsy or post-stroke dysarthria—severely degrades the performance of mainstream ASR systems (e.g., Whisper), primarily due to scarce training data, high acoustic variability, and prohibitive annotation costs. To address this, we propose a personalized ASR framework based on Variational Low-Rank Adaptation (VLora), which integrates Bayesian inference with parameter-efficient fine-tuning to achieve dual efficiency in data and annotation usage under few-shot and cross-lingual settings. Implemented atop Whisper, our method fine-tunes only a small set of low-rank parameters while explicitly modeling predictive uncertainty. This enhances adaptation to speaker-specific acoustic distributions with minimal supervision. Experiments on the English UA-Speech and German BF-Sprache datasets demonstrate substantial WER reductions compared to baseline fine-tuning approaches, significantly lowering reliance on high-quality labeled data. Our approach provides a scalable, low-resource solution for inclusive, personalized speech recognition.

Technology Category

Application Category

📝 Abstract
Speech impairments resulting from congenital disorders, such as cerebral palsy, down syndrome, or apert syndrome, as well as acquired brain injuries due to stroke, traumatic accidents, or tumors, present major challenges to automatic speech recognition (ASR) systems. Despite recent advancements, state-of-the-art ASR models like Whisper still struggle with non-normative speech due to limited training data availability and high acoustic variability. Moreover, collecting and annotating non-normative speech is burdensome: speaking is effortful for many affected individuals, while laborious annotation often requires caregivers familiar with the speaker. This work introduces a novel ASR personalization method based on Bayesian Low-rank Adaptation for data-efficient fine-tuning. We validate our method on the English UA-Speech dataset and a newly collected German speech dataset, BF-Sprache, from a child with structural speech impairment. The dataset and approach are designed to reflect the challenges of low-resource settings that include individuals with speech impairments. Our method significantly improves ASR accuracy for impaired speech while maintaining data and annotation efficiency, offering a practical path toward inclusive ASR.
Problem

Research questions and friction points this paper is trying to address.

Improving ASR accuracy for impaired speech from congenital disorders and brain injuries
Addressing data scarcity and acoustic variability in non-normative speech recognition
Developing data-efficient personalization methods for low-resource impaired speech settings
Innovation

Methods, ideas, or system contributions that make the work stand out.

Bayesian Low-rank Adaptation for fine-tuning
Personalized ASR for impaired speech
Data-efficient method for low-resource settings
🔎 Similar Papers
No similar papers found.
N
Niclas Pokel
Institute of Neuroinformatics, University of Zurich and ETH Zurich, Switzerland
P
Pehuén Moure
Institute of Neuroinformatics, University of Zurich and ETH Zurich, Switzerland
R
Roman Boehringer
Institute of Neuroinformatics, University of Zurich and ETH Zurich, Switzerland
Shih-Chii Liu
Shih-Chii Liu
Institute of Neuroinformatics, University of Zurich & ETH Zurich
Spiking neuromorphic sensorsevent-driven deep learningneuromorphic computingBM interfaces
Y
Yingqiang Gao
Department of Computational Linguistics, University of Zurich, Switzerland