Zero-Shot Cognitive Impairment Detection from Speech Using AudioLLM

📅 2025-06-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Early detection of cognitive impairment (CI) urgently requires non-invasive, generalizable assessment methods. This paper proposes the first zero-shot, cross-lingual CI detection framework based on the audio large language model (AudioLLM) Qwen2-Audio, eliminating reliance on manual annotations and language-specific acoustic features inherent in conventional supervised learning. Leveraging instruction-guided prompting, the model jointly processes acoustic and semantic information to discriminate CI at the utterance level without fine-tuning. Evaluated on English and multilingual datasets, our method achieves performance comparable to supervised baselines, exhibits high cross-lingual consistency, and maintains robustness across diverse cognitive assessment tasks. Our key contributions are: (i) the first application of AudioLLMs to zero-shot CI detection; (ii) substantial improvements in cross-lingual and cross-task generalization; and (iii) a novel paradigm for label-free, multilingual CI screening in low-resource settings.

Technology Category

Application Category

📝 Abstract
Cognitive impairment (CI) is of growing public health concern, and early detection is vital for effective intervention. Speech has gained attention as a non-invasive and easily collectible biomarker for assessing cognitive decline. Traditional CI detection methods typically rely on supervised models trained on acoustic and linguistic features extracted from speech, which often require manual annotation and may not generalise well across datasets and languages. In this work, we propose the first zero-shot speech-based CI detection method using the Qwen2- Audio AudioLLM, a model capable of processing both audio and text inputs. By designing prompt-based instructions, we guide the model in classifying speech samples as indicative of normal cognition or cognitive impairment. We evaluate our approach on two datasets: one in English and another multilingual, spanning different cognitive assessment tasks. Our results show that the zero-shot AudioLLM approach achieves performance comparable to supervised methods and exhibits promising generalizability and consistency across languages, tasks, and datasets.
Problem

Research questions and friction points this paper is trying to address.

Detect cognitive impairment from speech without prior training
Overcome dataset and language limitations in traditional methods
Use AudioLLM for zero-shot classification of cognitive states
Innovation

Methods, ideas, or system contributions that make the work stand out.

Zero-shot CI detection using AudioLLM
Prompt-based instructions for classification
Multilingual performance comparable to supervised methods
🔎 Similar Papers
No similar papers found.