Unveil Multi-Picture Descriptions for Multilingual Mild Cognitive Impairment Detection via Contrastive Learning

📅 2025-05-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Automatic detection of Mild Cognitive Impairment (MCI) in multilingual, multi-image description scenarios remains challenging due to the limitations of existing single-image, single-language paradigms. Method: This paper proposes the first cross-lingual multimodal diagnostic framework integrating speech, text, and image modalities. Specifically, it (1) introduces supervised contrastive learning to enhance multilingual textual representations; (2) explicitly incorporates visual modality to aid semantic discrimination; and (3) employs a Product-of-Experts (PoE) ensemble strategy to mitigate spurious correlations and overfitting. Results: Evaluated on the TAUKDIAL-2024 multilingual, multi-image benchmark, our method achieves absolute improvements of +7.1 percentage points in Unweighted Average Recall (UAR: 68.1% → 75.2%) and +2.9 points in F1-score (80.6% → 83.5%) over text-only baselines, demonstrating the critical benefits of image-guided learning and contrastive representation learning for textual modalities.

Technology Category

Application Category

📝 Abstract
Detecting Mild Cognitive Impairment from picture descriptions is critical yet challenging, especially in multilingual and multiple picture settings. Prior work has primarily focused on English speakers describing a single picture (e.g., the 'Cookie Theft'). The TAUKDIAL-2024 challenge expands this scope by introducing multilingual speakers and multiple pictures, which presents new challenges in analyzing picture-dependent content. To address these challenges, we propose a framework with three components: (1) enhancing discriminative representation learning via supervised contrastive learning, (2) involving image modality rather than relying solely on speech and text modalities, and (3) applying a Product of Experts (PoE) strategy to mitigate spurious correlations and overfitting. Our framework improves MCI detection performance, achieving a +7.1% increase in Unweighted Average Recall (UAR) (from 68.1% to 75.2%) and a +2.9% increase in F1 score (from 80.6% to 83.5%) compared to the text unimodal baseline. Notably, the contrastive learning component yields greater gains for the text modality compared to speech. These results highlight our framework's effectiveness in multilingual and multi-picture MCI detection.
Problem

Research questions and friction points this paper is trying to address.

Detecting Mild Cognitive Impairment in multilingual picture descriptions
Addressing challenges in multi-picture content analysis
Improving detection accuracy via multimodal contrastive learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Supervised contrastive learning enhances representation
Image modality complements speech and text
Product of Experts reduces overfitting effectively
🔎 Similar Papers
No similar papers found.
K
Kristin Qi
Computer Science, University of Massachusetts, Boston, MA, USA
Jiali Cheng
Jiali Cheng
UMass Lowell
Trustworthy AILanguage AgentsAI4Science
Y
Youxiang Zhu
Computer Science, University of Massachusetts, Boston, MA, USA
H
Hadi Amiri
Computer Science, University of Massachusetts, Lowell, MA, USA
Xiaohui Liang
Xiaohui Liang
University of Massachusetts Boston
Mobile HealthcareVoice TechnologyInternet of ThingsPrivacy