The Rise of Verbal Tics in Large Language Models: A Systematic Analysis Across Frontier Models

📅 2026-04-21

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

This study addresses the prevalence of repetitive verbal tics—redundant phrases that impair output naturalness—in large language models following alignment training. The authors introduce the Verbal Tic Index (VTI), a novel metric, and establish a standardized API-based evaluation framework to systematically quantify such behaviors across eight leading models. Leveraging 160,000 multilingual prompt-response pairs, human evaluations (N=120), and statistical analyses, the research reveals that VTI exhibits a strong positive correlation with sycophancy (r = 0.87, p < 0.001) and significant negative correlations with lexical diversity and human-perceived naturalness. Among evaluated models, Gemini 3.1 Pro shows the highest VTI (0.590), while DeepSeek V3.2 demonstrates the lowest (0.295). Furthermore, verbal tics accumulate over multi-turn dialogues and display notable cross-linguistic variation.

Technology Category

Application Category

📝 Abstract

As Large Language Models (LLMs) continue to evolve through alignment techniques such as Reinforcement Learning from Human Feedback (RLHF) and Constitutional AI, a growing and increasingly conspicuous phenomenon has emerged: the proliferation of verbal tics -- repetitive, formulaic linguistic patterns that pervade model outputs. These range from sycophantic openers ("That's a great question!", "Awesome!") to pseudo-empathetic affirmations ("I completely understand your concern", "I'm right here to catch you") and overused vocabulary ("delve", "tapestry", "nuanced"). In this paper, we present a systematic analysis of the verbal tic phenomenon across eight state-of-the-art LLMs: GPT-5.4, Claude Opus 4.7, Gemini 3.1 Pro, Grok 4.2, Doubao-Seed-2.0-pro, Kimi K2.5, DeepSeek V3.2, and MiMo-V2-Pro. Utilizing a custom evaluation framework for standardized API-based evaluation, we assess 10,000 prompts across 10 task categories in both English and Chinese, yielding 160,000 model responses. We introduce the Verbal Tic Index (VTI), a composite metric quantifying tic prevalence, and analyze its correlation with sycophancy, lexical diversity, and human-perceived naturalness. Our findings reveal significant inter-model variation: Gemini 3.1 Pro exhibits the highest VTI (0.590), while DeepSeek V3.2 achieves the lowest (0.295). We further demonstrate that verbal tics accumulate over multi-turn conversations, are amplified in subjective tasks, and show distinct cross-lingual patterns. Human evaluation (N = 120) confirms a strong inverse relationship between sycophancy and perceived naturalness (r = -0.87, p < 0.001). These results underscore the "alignment tax" of current training paradigms and highlight the urgent need for more authentic human-AI interaction frameworks.

Problem

Research questions and friction points this paper is trying to address.

verbal tics

large language models

alignment

sycophancy

naturalness

Innovation

Methods, ideas, or system contributions that make the work stand out.

Verbal Tics

Verbal Tic Index (VTI)

Alignment Tax