đ€ AI Summary
This study addresses the PROCESS Challenge 2024, aiming to enable non-invasive, automatic detection of mild cognitive impairment (MCI) from clinician-guided spoken tasksânarration, repetition, and semantic generationâusing both speech and transcribed text. We propose a novel multi-source complementary modeling framework that integrates acoustic pause features, LIWC-based linguistic statistics, macro-linguistic descriptors generated by large language models (LLMs), and multimodal neural representations (LongFormer for text; ECAPA-TDNN and TRILLsson for speech). These heterogeneous features are fused via an XGBoost/SVM ensemble classifier to achieve cross-modal discriminative learning. Our approach achieves state-of-the-art balanced performance on the three-class AD/MCI/HC classification task, significantly improving accuracy and generalizability across subtasks. The method establishes a new paradigm for early cognitive decline screening from naturalistic dialogue: interpretable, lightweight, and multimodal.
đ Abstract
This work describes our group's submission to the PROCESS Challenge 2024, with the goal of assessing cognitive decline through spontaneous speech, using three guided clinical tasks. This joint effort followed a holistic approach, encompassing both knowledge-based acoustic and text-based feature sets, as well as LLM-based macrolinguistic descriptors, pause-based acoustic biomarkers, and multiple neural representations (e.g., LongFormer, ECAPA-TDNN, and Trillson embeddings). Combining these feature sets with different classifiers resulted in a large pool of models, from which we selected those that provided the best balance between train, development, and individual class performance. Our results show that our best performing systems correspond to combinations of models that are complementary to each other, relying on acoustic and textual information from all three clinical tasks.