Do You Feel Comfortable? Detecting Hidden Conversational Escalation in AI Chatbots

📅 2025-12-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the latent “dialogue escalation” risk in AI emotional companions—namely, non-toxic yet progressively intensifying emotional reinforcement or affective drift that exacerbates user distress. To this end, we propose GAUGE, a novel framework that introduces LLM output-logit–based probabilistic dynamic modeling. Unlike conventional approaches, GAUGE operates directly on logits without external classifiers, enabling fine-grained, real-time quantification of affective state transitions. Evaluated against standard toxicity filters and clinical assessment scales, GAUGE achieves significantly higher detection rates for implicit affective harm, while offering millisecond-level latency, high sensitivity, and lightweight deployment. By grounding safety monitoring in interpretable, logit-level dynamics, GAUGE establishes a practical, explainable paradigm for evaluating emotional safety in large language model–driven interpersonal interactions.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLM) are increasingly integrated into everyday interactions, serving not only as information assistants but also as emotional companions. Even in the absence of explicit toxicity, repeated emotional reinforcement or affective drift can gradually escalate distress in a form of extit{implicit harm} that traditional toxicity filters fail to detect. Existing guardrail mechanisms often rely on external classifiers or clinical rubrics that may lag behind the nuanced, real-time dynamics of a developing conversation. To address this gap, we propose GAUGE (Guarding Affective Utterance Generation Escalation), a lightweight, logit-based framework for the real-time detection of hidden conversational escalation. GAUGE measures how an LLM's output probabilistically shifts the affective state of a dialogue.
Problem

Research questions and friction points this paper is trying to address.

Detect hidden conversational escalation in AI chatbots
Address implicit harm from emotional reinforcement or drift
Provide real-time detection beyond traditional toxicity filters
Innovation

Methods, ideas, or system contributions that make the work stand out.

Real-time detection of hidden conversational escalation
Lightweight logit-based framework GAUGE
Measures LLM output's affective state shifts
🔎 Similar Papers
No similar papers found.
J
Jihyung Park
The University of Texas at Austin
S
Saleh Afroogh
The University of Texas at Austin
Junfeng Jiao
Junfeng Jiao
Associate Professor, Urban Information Lab, Texas Smart City, NSF NRT AI, UT Austin
AISmart CityUrban Informatics