Vulnerability-Amplifying Interaction Loops: a systematic failure mode in AI chatbot mental-health interactions

📅 2026-02-01
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the risk that AI chatbots may inadvertently amplify users’ psychological vulnerability during mental health conversations, leading to harmful “Vulnerability-Amplifying Interaction Loops” (VAILs). To this end, the authors propose SIM-VAIL, an auditing framework that formally defines and identifies VAILs as a novel failure mode in human–AI interaction. The framework employs simulated user agents representing 13 clinically relevant psychological phenotypes to conduct multidimensional, time-series risk assessments across 810 dialogues with nine commercial models. Results reveal that nearly all phenotypes exhibit significant, conversationally cumulative risks that are phenotype-dependent. Furthermore, safety interventions targeting a single dimension may exacerbate harms in others, underscoring the necessity of coordinated, multidimensional evaluation in mental health–oriented AI systems.

Technology Category

Application Category

📝 Abstract
Millions of users turn to consumer AI chatbots to discuss behavioral and mental health concerns. While this presents unprecedented opportunities to deliver population-level support, it also highlights an urgent need to develop rigorous and scalable safety evaluations. Here we introduce SIM-VAIL, an AI chatbot auditing framework that captures how harmful AI chatbot responses manifest across a range of mental-health contexts. SIM-VAIL pairs a simulated human user, harboring a distinct psychiatric vulnerability and conversational intent, with an audited frontier AI chatbot. It scores conversation turns on 13 clinically relevant risk dimensions, enabling context-dependent, temporally resolved assessment of mental-health risk. Across 810 conversations, encompassing over 90,000 turn-level ratings and 30 psychiatric user profiles, we find that significant risk occurs across virtually all user phenotypes. Risk manifested across most of the 9 consumer AI chatbot models audited, albeit mitigated in more modern variants. Rather than arising abruptly, risk accumulated over multiple turns. Risk profiles were phenotype-dependent, indicating that behaviors that appear supportive in general settings are liable to be maladaptive when they align with mechanisms that sustain a user's vulnerability. Multivariate risk patterns revealed trade-offs across dimensions, suggesting that mitigation targeting one harm domain can exacerbate others. These findings identify a novel failure mode in human-AI interactions, which we term Vulnerability-Amplifying Interaction Loops (VAILs), and underscore the need for multi-dimensional approaches to risk quantification. SIM-VAIL provides a scalable evaluation framework for quantifying how mental-health risk is distributed across user phenotypes, conversational trajectories, and clinically grounded behavioral dimensions, offering a foundation for targeted safety improvements.
Problem

Research questions and friction points this paper is trying to address.

Vulnerability-Amplifying Interaction Loops
AI chatbot safety
mental-health risk
human-AI interaction
psychiatric vulnerability
Innovation

Methods, ideas, or system contributions that make the work stand out.

Vulnerability-Amplifying Interaction Loops
SIM-VAIL
mental-health risk assessment
AI chatbot safety
multi-dimensional risk quantification
V
Veith Weilnhammer
Max Planck UCL Centre for Computational Psychiatry and Ageing Research, London, UK
K
Kevin YC Hou
Sydney Medical School, University of Sydney, Sydney, Australia
Raymond Dolan
Raymond Dolan
UCL
Neuroscience Decision Making Neural Replay Reinforcement Learning (RL) DopamineComputational Psychiatry
M
Matthew M Nour
Department of Psychiatry, University of Oxford, Oxford, UK