Vulnerability-Amplifying Interaction Loops: a systematic failure mode in AI chatbot mental-health interactions

📅 2026-02-01

📈 Citations: 1

✨ Influential: 0

career value

226K/year

🤖 AI Summary

This study addresses the risk that AI chatbots may inadvertently amplify users’ psychological vulnerability during mental health conversations, leading to harmful “Vulnerability-Amplifying Interaction Loops” (VAILs). To this end, the authors propose SIM-VAIL, an auditing framework that formally defines and identifies VAILs as a novel failure mode in human–AI interaction. The framework employs simulated user agents representing 13 clinically relevant psychological phenotypes to conduct multidimensional, time-series risk assessments across 810 dialogues with nine commercial models. Results reveal that nearly all phenotypes exhibit significant, conversationally cumulative risks that are phenotype-dependent. Furthermore, safety interventions targeting a single dimension may exacerbate harms in others, underscoring the necessity of coordinated, multidimensional evaluation in mental health–oriented AI systems.

Technology Category

Application Category

📝 Abstract

Millions of users turn to consumer AI chatbots to discuss behavioral and mental health concerns. While this presents unprecedented opportunities to deliver population-level support, it also highlights an urgent need to develop rigorous and scalable safety evaluations. Here we introduce SIM-VAIL, an AI chatbot auditing framework that captures how harmful AI chatbot responses manifest across a range of mental-health contexts. SIM-VAIL pairs a simulated human user, harboring a distinct psychiatric vulnerability and conversational intent, with an audited frontier AI chatbot. It scores conversation turns on 13 clinically relevant risk dimensions, enabling context-dependent, temporally resolved assessment of mental-health risk. Across 810 conversations, encompassing over 90,000 turn-level ratings and 30 psychiatric user profiles, we find that significant risk occurs across virtually all user phenotypes. Risk manifested across most of the 9 consumer AI chatbot models audited, albeit mitigated in more modern variants. Rather than arising abruptly, risk accumulated over multiple turns. Risk profiles were phenotype-dependent, indicating that behaviors that appear supportive in general settings are liable to be maladaptive when they align with mechanisms that sustain a user's vulnerability. Multivariate risk patterns revealed trade-offs across dimensions, suggesting that mitigation targeting one harm domain can exacerbate others. These findings identify a novel failure mode in human-AI interactions, which we term Vulnerability-Amplifying Interaction Loops (VAILs), and underscore the need for multi-dimensional approaches to risk quantification. SIM-VAIL provides a scalable evaluation framework for quantifying how mental-health risk is distributed across user phenotypes, conversational trajectories, and clinically grounded behavioral dimensions, offering a foundation for targeted safety improvements.

Problem

Research questions and friction points this paper is trying to address.

Vulnerability-Amplifying Interaction Loops

AI chatbot safety

mental-health risk

human-AI interaction

psychiatric vulnerability

Innovation

Methods, ideas, or system contributions that make the work stand out.

Vulnerability-Amplifying Interaction Loops

SIM-VAIL

mental-health risk assessment