🤖 AI Summary
This study investigates whether large language models (LLMs) exhibit linguistic convergence in dialogue—i.e., actively adapting their output to users’ linguistic styles. Method: We systematically analyze lexical, syntactic, and prosodic stylometric features across 16 LLMs generating responses on three dialogue corpora, employing continuation-based contrastive generation and controlled cross-model/cross-corpus experiments. Contribution/Results: We find pervasive, statistically significant convergence behavior across models; however, adaptation is frequently excessive, leading to stylistic overfitting. Surprisingly, instruction tuning and larger parameter counts consistently attenuate convergence—suggesting a mechanistic divergence from human sociocognitive adaptation. To our knowledge, this is the first empirical demonstration of structural biases in LLMs’ interactive stylistic adaptability. Our findings reveal fundamental limitations in current LLMs’ alignment with human-like dialogue dynamics and provide critical insights for developing more reliable, human-centered language models for collaborative interaction.
📝 Abstract
While large language models (LLMs) are generally considered proficient in generating language, how similar their language usage is to that of humans remains understudied. In this paper, we test whether models exhibit linguistic convergence, a core pragmatic element of human language communication, asking: do models adapt, or converge, to the linguistic patterns of their user? To answer this, we systematically compare model completions of exisiting dialogues to the original human responses across sixteen language models, three dialogue corpora, and a variety of stylometric features. We find that models strongly converge to the conversation's style, often significantly overfitting relative to the human baseline. While convergence patterns are often feature-specific, we observe consistent shifts in convergence across modeling settings, with instruction-tuned and larger models converging less than their pretrained counterparts. Given the differences between human and model convergence patterns, we hypothesize that the underlying mechanisms for these behaviors are very different.