Mirror: Multimodal Cognitive Reframing Therapy for Rolling with Resistance

📅 2025-04-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing text-based AI-CBT models struggle to detect and respond effectively to client resistance, thereby undermining the therapeutic alliance. Method: We propose a multimodal intervention framework featuring (1) Mirror—a novel synthetic multimodal dataset for resistance scenarios, comprising paired client utterances and synchronized facial video frames; (2) a “rolling resistance” intervention paradigm enabling joint facial-expression–semantic modeling; and (3) an integrated pipeline combining vision-language representation learning, fine-grained affective reasoning, LLM-driven empathic response generation, and quantitative therapeutic alliance assessment. Contribution/Results: Experiments demonstrate significant improvements in resistance detection and empathic responsiveness. Our model outperforms text-only baselines across both counseling skill metrics and alliance strength measures. This work advances AI-powered psychotherapy from unimodal (text-only) paradigms toward multimodal frameworks capable of perceiving and interpreting nonverbal emotional cues.

Technology Category

Application Category

📝 Abstract
Recent studies have explored the use of large language models (LLMs) in psychotherapy; however, text-based cognitive behavioral therapy (CBT) models often struggle with client resistance, which can weaken therapeutic alliance. To address this, we propose a multimodal approach that incorporates nonverbal cues, allowing the AI therapist to better align its responses with the client's negative emotional state. Specifically, we introduce a new synthetic dataset, Multimodal Interactive Rolling with Resistance (Mirror), which is a novel synthetic dataset that pairs client statements with corresponding facial images. Using this dataset, we train baseline Vision-Language Models (VLMs) that can analyze facial cues, infer emotions, and generate empathetic responses to effectively manage resistance. They are then evaluated in terms of both the therapist's counseling skills and the strength of the therapeutic alliance in the presence of client resistance. Our results demonstrate that Mirror significantly enhances the AI therapist's ability to handle resistance, which outperforms existing text-based CBT approaches.
Problem

Research questions and friction points this paper is trying to address.

Address client resistance in text-based CBT using multimodal cues
Enhance AI therapist empathy via facial emotion analysis
Improve therapeutic alliance during resistance with Vision-Language Models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal approach using nonverbal cues
Synthetic dataset pairing statements with facial images
Vision-Language Models analyzing facial cues for empathy
🔎 Similar Papers
No similar papers found.