Time-To-Inconsistency: A Survival Analysis of Large Language Model Robustness to Adversarial Attacks

📅 2025-10-03

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Existing LLM robustness evaluations predominantly rely on single-turn static benchmarks, failing to capture the dynamic degradation of dialogue quality over multiple turns. This work introduces survival analysis—previously unexplored in dialogue robustness assessment—to model dialogue failure as a time-to-event outcome, thereby characterizing temporal failure patterns. We uncover a non-monotonic effect of semantic drift on system stability: gradual drift exhibits a protective effect, challenging the conventional assumption that strict semantic consistency is always necessary. We propose a novel evaluation paradigm grounded in Cox proportional hazards, accelerated failure time (AFT), and random survival forests, incorporating interaction terms to enhance interpretability. Evaluated on 36,951 real-world dialogue turns, the AFT model achieves significantly superior discrimination and calibration compared to baselines, markedly improving prediction accuracy for dialogue failure.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) have revolutionized conversational AI, yet their robustness in extended multi-turn dialogues remains poorly understood. Existing evaluation frameworks focus on static benchmarks and single-turn assessments, failing to capture the temporal dynamics of conversational degradation that characterize real-world interactions. In this work, we present the first comprehensive survival analysis of conversational AI robustness, analyzing 36,951 conversation turns across 9 state-of-the-art LLMs to model failure as a time-to-event process. Our survival modeling framework-employing Cox proportional hazards, Accelerated Failure Time, and Random Survival Forest approaches-reveals extraordinary temporal dynamics. We find that abrupt, prompt-to-prompt(P2P) semantic drift is catastrophic, dramatically increasing the hazard of conversational failure. In stark contrast, gradual, cumulative drift is highly protective, vastly reducing the failure hazard and enabling significantly longer dialogues. AFT models with interactions demonstrate superior performance, achieving excellent discrimination and exceptional calibration. These findings establish survival analysis as a powerful paradigm for evaluating LLM robustness, offer concrete insights for designing resilient conversational agents, and challenge prevailing assumptions about the necessity of semantic consistency in conversational AI Systems.

Problem

Research questions and friction points this paper is trying to address.

Analyzes LLM robustness degradation over multi-turn conversations

Models conversational failure as time-to-event survival process

Investigates impact of semantic drift patterns on dialogue longevity

Innovation

Methods, ideas, or system contributions that make the work stand out.

Survival analysis models LLM failure as time-to-event

Cox and AFT models reveal temporal robustness dynamics

Random Survival Forests provide superior discrimination and calibration

🔎 Similar Papers

No similar papers found.

Authors to Follow