ClaritySpeech: Dementia Obfuscation in Speech

📅 2025-07-12

📈 Citations: 0

✨ Influential: 0

career value

193K/year

🤖 AI Summary

Dementia induces atypical speech patterns, exacerbating communication difficulties and increasing risks of speaker identity or clinical condition disclosure; however, conventional ASR systems exhibit poor robustness to such non-canonical speech. This paper proposes a fine-tuning-free, low-data-dependent voice de-identification framework that innovatively integrates ASR transcription, semantic-aware text perturbation, and zero-shot TTS reconstruction—preserving speaker identity while obfuscating dementia-associated linguistic markers. Evaluated on the ADReSS and ADReSSo datasets under adversarial conditions, the method incurs only a 10–16% drop in average F1 score, retains ~50% speaker similarity, substantially reduces WER, and improves MOS speech quality from 1.65 to 2.15. The approach thus achieves a balanced trade-off among privacy preservation, intelligibility, and naturalness.

Technology Category

Application Category

📝 Abstract

Dementia, a neurodegenerative disease, alters speech patterns, creating communication barriers and raising privacy concerns. Current speech technologies, such as automatic speech transcription (ASR), struggle with dementia and atypical speech, further challenging accessibility. This paper presents a novel dementia obfuscation in speech framework, ClaritySpeech, integrating ASR, text obfuscation, and zero-shot text-to-speech (TTS) to correct dementia-affected speech while preserving speaker identity in low-data environments without fine-tuning. Results show a 16% and 10% drop in mean F1 score across various adversarial settings and modalities (audio, text, fusion) for ADReSS and ADReSSo, respectively, maintaining 50% speaker similarity. We also find that our system improves WER (from 0.73 to 0.08 for ADReSS and 0.15 for ADReSSo) and speech quality from 1.65 to ~2.15, enhancing privacy and accessibility.

Problem

Research questions and friction points this paper is trying to address.

Corrects dementia-affected speech patterns for better communication

Preserves speaker identity in low-data environments

Improves speech quality and privacy for dementia patients

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates ASR, text obfuscation, and zero-shot TTS

Corrects dementia speech without fine-tuning

Preserves speaker identity in low-data environments

🔎 Similar Papers

A Systematic Review of NLP for Dementia- Tasks, Datasets and Opportunities