Multimodal Cognitive Reframing Therapy via Multi-hop Psychotherapeutic Reasoning

📅 2025-02-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Cognitive Restructuring Therapy (CRT) in clinical practice largely overlooks nonverbal cues, limiting its capacity for empathic, evidence-based intervention. Method: This work pioneers the integration of visual modalities—specifically facial expressions—into CRT, introducing M2CoSC, the first multimodal, image-text paired dataset supporting multi-hop psychological reasoning. We propose a novel multi-hop psychological reasoning framework that explicitly models implicit emotional evidence chains, enabling evidence-grounded empathic intervention. Our approach synergistically combines vision-language models (VLMs) and large language models (LLMs), incorporating cross-modal alignment, multi-step reasoning prompting, and emotional evidence tracing. Results: Experiments on M2CoSC demonstrate substantial improvements in VLMs’ psychotherapeutic capability: generated recommendations exhibit significantly enhanced empathy and critical thinking over baselines, with a 23.6% gain in the composite metric (BLEU-4 + EmpathyScore).

Technology Category

Application Category

📝 Abstract
Previous research has revealed the potential of large language models (LLMs) to support cognitive reframing therapy; however, their focus was primarily on text-based methods, often overlooking the importance of non-verbal evidence crucial in real-life therapy. To alleviate this gap, we extend the textual cognitive reframing to multimodality, incorporating visual clues. Specifically, we present a new dataset called Multi Modal-Cognitive Support Conversation (M2CoSC), which pairs each GPT-4-generated dialogue with an image that reflects the virtual client's facial expressions. To better mirror real psychotherapy, where facial expressions lead to interpreting implicit emotional evidence, we propose a multi-hop psychotherapeutic reasoning approach that explicitly identifies and incorporates subtle evidence. Our comprehensive experiments with both LLMs and vision-language models (VLMs) demonstrate that the VLMs' performance as psychotherapists is significantly improved with the M2CoSC dataset. Furthermore, the multi-hop psychotherapeutic reasoning method enables VLMs to provide more thoughtful and empathetic suggestions, outperforming standard prompting methods.
Problem

Research questions and friction points this paper is trying to address.

Extends text-based to multimodal cognitive reframing therapy
Incorporates visual clues for emotional evidence interpretation
Improves VLMs' psychotherapy performance with new dataset
Innovation

Methods, ideas, or system contributions that make the work stand out.

Extends text therapy to multimodality
Introduces M2CoSC dataset with images
Proposes multi-hop reasoning for emotions
🔎 Similar Papers
No similar papers found.