Do You Feel Comfortable? Detecting Hidden Conversational Escalation in AI Chatbots

📅 2025-12-05

📈 Citations: 0

✨ Influential: 0

career value

204K/year

🤖 AI Summary

This study addresses the latent “dialogue escalation” risk in AI emotional companions—namely, non-toxic yet progressively intensifying emotional reinforcement or affective drift that exacerbates user distress. To this end, we propose GAUGE, a novel framework that introduces LLM output-logit–based probabilistic dynamic modeling. Unlike conventional approaches, GAUGE operates directly on logits without external classifiers, enabling fine-grained, real-time quantification of affective state transitions. Evaluated against standard toxicity filters and clinical assessment scales, GAUGE achieves significantly higher detection rates for implicit affective harm, while offering millisecond-level latency, high sensitivity, and lightweight deployment. By grounding safety monitoring in interpretable, logit-level dynamics, GAUGE establishes a practical, explainable paradigm for evaluating emotional safety in large language model–driven interpersonal interactions.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLM) are increasingly integrated into everyday interactions, serving not only as information assistants but also as emotional companions. Even in the absence of explicit toxicity, repeated emotional reinforcement or affective drift can gradually escalate distress in a form of extit{implicit harm} that traditional toxicity filters fail to detect. Existing guardrail mechanisms often rely on external classifiers or clinical rubrics that may lag behind the nuanced, real-time dynamics of a developing conversation. To address this gap, we propose GAUGE (Guarding Affective Utterance Generation Escalation), a lightweight, logit-based framework for the real-time detection of hidden conversational escalation. GAUGE measures how an LLM's output probabilistically shifts the affective state of a dialogue.

Problem

Research questions and friction points this paper is trying to address.

Detect hidden conversational escalation in AI chatbots

Address implicit harm from emotional reinforcement or drift

Provide real-time detection beyond traditional toxicity filters

Innovation

Methods, ideas, or system contributions that make the work stand out.

Real-time detection of hidden conversational escalation

Lightweight logit-based framework GAUGE

Measures LLM output's affective state shifts

🔎 Similar Papers

No similar papers found.