Dementia Insights: A Context-Based MultiModal Approach

📅 2025-03-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenges of expert-dependent labeling, modality fragmentation, and poor generalizability in non-invasive early dementia screening, this paper proposes a context-aware multimodal framework that jointly processes raw speech and transcribed text. Semantic features are extracted using pretrained language models (BERT, GPT), while acoustic features are derived via CLAP; a context-embedding-driven cross-modal fusion module integrates these representations. Crucially, this work introduces In-Context Learning (ICL) to dementia risk identification for the first time, eliminating reliance on manually annotated labels. Experimental results demonstrate an F1-score of 83.33%, significantly outperforming state-of-the-art methods. The approach achieves high diagnostic accuracy while drastically reducing annotation costs, and exhibits strong scalability and clinical applicability potential.

Technology Category

Application Category

📝 Abstract
Dementia, a progressive neurodegenerative disorder, affects memory, reasoning, and daily functioning, creating challenges for individuals and healthcare systems. Early detection is crucial for timely interventions that may slow disease progression. Large pre-trained models (LPMs) for text and audio, such as Generative Pre-trained Transformer (GPT), Bidirectional Encoder Representations from Transformers (BERT), and Contrastive Language-Audio Pretraining (CLAP), have shown promise in identifying cognitive impairments. However, existing studies generally rely heavily on expert-annotated datasets and unimodal approaches, limiting robustness and scalability. This study proposes a context-based multimodal method, integrating both text and audio data using the best-performing LPMs in each modality. By incorporating contextual embeddings, our method improves dementia detection performance. Additionally, motivated by the effectiveness of contextual embeddings, we further experimented with a context-based In-Context Learning (ICL) as a complementary technique. Results show that GPT-based embeddings, particularly when fused with CLAP audio features, achieve an F1-score of $83.33%$, surpassing state-of-the-art dementia detection models. Furthermore, raw text data outperforms expert-annotated datasets, demonstrating that LPMs can extract meaningful linguistic and acoustic patterns without extensive manual labeling. These findings highlight the potential for scalable, non-invasive diagnostic tools that reduce reliance on costly annotations while maintaining high accuracy. By integrating multimodal learning with contextual embeddings, this work lays the foundation for future advancements in personalized dementia detection and cognitive health research.
Problem

Research questions and friction points this paper is trying to address.

Early detection of dementia using multimodal data.
Improving dementia detection with contextual embeddings.
Reducing reliance on expert-annotated datasets for diagnostics.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal fusion of text and audio data
Contextual embeddings enhance dementia detection
In-Context Learning complements detection accuracy
🔎 Similar Papers
2024-01-02IEEE International Conference on Bioinformatics and BiomedicineCitations: 0