🤖 AI Summary
This work addresses the challenge of strategic drift in large language models during mental health conversations, which often undermines the effectiveness of long-term interventions. The authors propose a dual-role state space modeling approach that, for the first time, formalizes the therapist–client interaction as a bidirectionally evolving dynamic process. This framework continuously tracks both parties’ psychological states, goal alignment, and short-term intentions in real time, and integrates a goal-driven policy selection mechanism to generate motivational interviewing dialogues. Experimental results demonstrate that the proposed method significantly improves the precision of intervention timing and dialogue stability, maintaining high goal consistency over extended interactions. Notably, it achieves a client acceptance rate of 64.3%, outperforming existing baselines.
📝 Abstract
Large Language Models (LLMs) are increasingly used in mental health-related settings, yet they struggle to sustain realistic, goal-directed dialogue over extended interactions. While LLMs generate fluent responses, they optimize locally for the next turn rather than maintaining a coherent model of therapeutic progress, leading to brittleness and long-horizon drift. We introduce CALM-IT, a framework for generating and evaluating long-form Motivational Interviewing (MI) dialogues that explicitly models dual-actor conversational dynamics. CALM-IT represents therapist-client interaction as a bidirectional state-space process, in which both agents continuously update inferred alignment, mental states, and short-term goals to guide strategy selection and utterance generation. Across large-scale evaluations, CALM-IT consistently outperforms strong baselines in Effectiveness and Goal Alignment and remains substantially more stable as conversation length increases. Although CALM-IT initiates fewer therapist redirections, it achieves the highest client acceptance rate (64.3%), indicating more precise and therapeutically aligned intervention timing. Overall, CALM-IT provides evidence for modeling evolving conversational state being essential for generating high-quality long-form synthetic conversations.