VoxCog: Towards End-to-End Multilingual Cognitive Impairment Classification through Dialectal Knowledge

📅 2026-01-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study proposes VoxCog, a novel framework for automatic cross-lingual classification of cognitive impairments—such as Alzheimer’s disease and mild cognitive impairment—using only speech signals, thereby eliminating reliance on textual transcripts or multimodal data. Inspired by dialect identification, the approach treats pronunciation anomalies in patient speech (e.g., slowed speaking rate, prolonged syllables) as phonetic variations. Leveraging a pretrained speech foundation model, the system initializes a dialect classifier to build an end-to-end multilingual classification architecture. Evaluated on the ADReSS 2020 and ADReSSo 2021 test sets, VoxCog achieves accuracies of 87.5% and 85.9%, respectively, outperforming existing methods that depend on multimodal inputs or large language models. This work represents the first demonstration of efficient, modality-free, cross-lingual cognitive impairment detection solely from speech.

Technology Category

Application Category

📝 Abstract
In this work, we present a novel perspective on cognitive impairment classification from speech by integrating speech foundation models that explicitly recognize speech dialects. Our motivation is based on the observation that individuals with Alzheimer's Disease (AD) or mild cognitive impairment (MCI) often produce measurable speech characteristics, such as slower articulation rate and lengthened sounds, in a manner similar to dialectal phonetic variations seen in speech. Building on this idea, we introduce VoxCog, an end-to-end framework that uses pre-trained dialect models to detect AD or MCI without relying on additional modalities such as text or images. Through experiments on multiple multilingual datasets for AD and MCI detection, we demonstrate that model initialization with a dialect classifier on top of speech foundation models consistently improves the predictive performance of AD or MCI. Our trained models yield similar or often better performance compared to previous approaches that ensembled several computational methods using different signal modalities. Particularly, our end-to-end speech-based model achieves 87.5% and 85.9% accuracy on the ADReSS 2020 challenge and ADReSSo 2021 challenge test sets, outperforming existing solutions that use multimodal ensemble-based computation or LLMs.
Problem

Research questions and friction points this paper is trying to address.

cognitive impairment classification
Alzheimer's Disease
mild cognitive impairment
multilingual speech
dialectal variation
Innovation

Methods, ideas, or system contributions that make the work stand out.

speech foundation models
dialectal knowledge
cognitive impairment classification
end-to-end framework
multilingual speech analysis
🔎 Similar Papers
No similar papers found.
Tiantian Feng
Tiantian Feng
Postdoc Researcher
Health and BehaviorsWearable ComputingAffective ComputingSpeech and BiosignalResponsible ML
Anfeng Xu
Anfeng Xu
University of Southern California
Speech ProcessingMultimodal AILLMDeep Learning
J
Jinkook Lee
Dornsife College of Letters, Arts and Sciences, University of Southern California, Los Angeles, 90089, CA, United States
S
Shrikanth S. Narayanan
Ming Hsieh Department of Electrical and Computer Engineering, University of Southern California, 3740 McClintock Ave, Los Angeles, 90089, CA, United States