Retrieval-Augmented Multimodal Depression Detection

📅 2025-10-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address high computational costs, domain mismatch, and static knowledge limitations in multimodal depression detection—particularly those arising from conventional affective analysis—this paper proposes a Retrieval-Augmented Generation (RAG)-driven emotional prompting mechanism. The method fuses textual, audio, and visual modalities, dynamically retrieves affect-relevant knowledge from an external emotional knowledge base, and leverages large language models to generate interpretable emotional prompts, thereby enhancing cross-domain affective representation. It effectively mitigates domain shift while improving model generalizability and interpretability. Evaluated on the AVEC 2019 dataset, it achieves state-of-the-art performance (Concordance Correlation Coefficient = 0.593, Mean Absolute Error = 3.95), significantly outperforming existing transfer learning and multi-task learning approaches. The core contribution is the first application of the RAG paradigm to multimodal depression detection, enabling dynamic, interpretable, and minimally supervised affective modeling.

Technology Category

Application Category

📝 Abstract
Multimodal deep learning has shown promise in depression detection by integrating text, audio, and video signals. Recent work leverages sentiment analysis to enhance emotional understanding, yet suffers from high computational cost, domain mismatch, and static knowledge limitations. To address these issues, we propose a novel Retrieval-Augmented Generation (RAG) framework. Given a depression-related text, our method retrieves semantically relevant emotional content from a sentiment dataset and uses a Large Language Model (LLM) to generate an Emotion Prompt as an auxiliary modality. This prompt enriches emotional representation and improves interpretability. Experiments on the AVEC 2019 dataset show our approach achieves state-of-the-art performance with CCC of 0.593 and MAE of 3.95, surpassing previous transfer learning and multi-task learning baselines.
Problem

Research questions and friction points this paper is trying to address.

Addressing computational cost and domain mismatch in depression detection
Overcoming static knowledge limitations in multimodal emotion analysis
Enhancing emotional representation and interpretability using retrieval-augmented generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Retrieval-augmented generation framework for multimodal depression detection
Retrieves emotional content from sentiment dataset using LLM
Generates Emotion Prompt as auxiliary modality for representation
🔎 Similar Papers
No similar papers found.