๐ค AI Summary
Low real-world adoption of closed-loop insulin delivery systems (CLIDS) among individuals with type 1 diabetes stems primarily from behavioral, psychological, and social barriersโnot technical limitations. Method: We introduce the first standardized benchmark framework for evaluating health-promoting dialogues, integrating a clinically validated virtual patient repository, multi-strategy nurse agents, and evidence-based persuasive techniques, all built upon large language models (LLMs) to enable scalable, high-fidelity, multi-turn, personalized dialogue simulation. Contribution/Results: The framework uniquely supports longitudinal counseling and adversarial social pressure scenarios. It reveals, for the first time, systematic deficiencies in current LLMsโ capacity to enact behavioral interventions under strong resistance and authentic social stress. Experiments show that reflective LLMs can dynamically adapt strategies, yet overall performance remains insufficient against deep-seated resistance. This framework establishes a reproducible, empirically grounded evaluation paradigm for AI-driven behavioral interventions in chronic disease management.
๐ Abstract
Real-world adoption of closed-loop insulin delivery systems (CLIDS) in type 1 diabetes remains low, driven not by technical failure, but by diverse behavioral, psychosocial, and social barriers. We introduce ChatCLIDS, the first benchmark to rigorously evaluate LLM-driven persuasive dialogue for health behavior change. Our framework features a library of expert-validated virtual patients, each with clinically grounded, heterogeneous profiles and realistic adoption barriers, and simulates multi-turn interactions with nurse agents equipped with a diverse set of evidence-based persuasive strategies. ChatCLIDS uniquely supports longitudinal counseling and adversarial social influence scenarios, enabling robust, multi-dimensional evaluation. Our findings reveal that while larger and more reflective LLMs adapt strategies over time, all models struggle to overcome resistance, especially under realistic social pressure. These results highlight critical limitations of current LLMs for behavior change, and offer a high-fidelity, scalable testbed for advancing trustworthy persuasive AI in healthcare and beyond.