VoxCog: Towards End-to-End Multilingual Cognitive Impairment Classification through Dialectal Knowledge

📅 2026-01-12

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

This study proposes VoxCog, a novel framework for automatic cross-lingual classification of cognitive impairments—such as Alzheimer’s disease and mild cognitive impairment—using only speech signals, thereby eliminating reliance on textual transcripts or multimodal data. Inspired by dialect identification, the approach treats pronunciation anomalies in patient speech (e.g., slowed speaking rate, prolonged syllables) as phonetic variations. Leveraging a pretrained speech foundation model, the system initializes a dialect classifier to build an end-to-end multilingual classification architecture. Evaluated on the ADReSS 2020 and ADReSSo 2021 test sets, VoxCog achieves accuracies of 87.5% and 85.9%, respectively, outperforming existing methods that depend on multimodal inputs or large language models. This work represents the first demonstration of efficient, modality-free, cross-lingual cognitive impairment detection solely from speech.

Technology Category

Application Category

📝 Abstract

In this work, we present a novel perspective on cognitive impairment classification from speech by integrating speech foundation models that explicitly recognize speech dialects. Our motivation is based on the observation that individuals with Alzheimer's Disease (AD) or mild cognitive impairment (MCI) often produce measurable speech characteristics, such as slower articulation rate and lengthened sounds, in a manner similar to dialectal phonetic variations seen in speech. Building on this idea, we introduce VoxCog, an end-to-end framework that uses pre-trained dialect models to detect AD or MCI without relying on additional modalities such as text or images. Through experiments on multiple multilingual datasets for AD and MCI detection, we demonstrate that model initialization with a dialect classifier on top of speech foundation models consistently improves the predictive performance of AD or MCI. Our trained models yield similar or often better performance compared to previous approaches that ensembled several computational methods using different signal modalities. Particularly, our end-to-end speech-based model achieves 87.5% and 85.9% accuracy on the ADReSS 2020 challenge and ADReSSo 2021 challenge test sets, outperforming existing solutions that use multimodal ensemble-based computation or LLMs.

Problem

Research questions and friction points this paper is trying to address.

cognitive impairment classification

Alzheimer's Disease

mild cognitive impairment

multilingual speech

dialectal variation

Innovation

Methods, ideas, or system contributions that make the work stand out.

speech foundation models

dialectal knowledge

cognitive impairment classification

end-to-end framework

multilingual speech analysis

🔎 Similar Papers

MultiMed: Multilingual Medical Speech Recognition via Attention Encoder Decoder

2024-09-21arXiv.orgCitations: 0