Stability as a Liability:Systematic Breakdown of Linguistic Structure in LLMs

📅 2026-01-26

📈 Citations: 0

✨ Influential: 0

career value

211K/year

🤖 AI Summary

This work uncovers an intrinsic tension between training stability and generation diversity in large language models: overly stable training dynamics implicitly minimize the forward KL divergence, leading to reduced output entropy and degradation of linguistic structure. To address this, we propose a feedback-driven controlled training framework that integrates maximum likelihood objectives with real-time generative statistical analysis. Through systematic experiments across diverse architectures and random seeds, we demonstrate that stable training often yields low-entropy, repetitive outputs. Our findings challenge the prevailing assumption that training stability is a sufficient proxy for generation quality, offering a novel perspective on how optimization dynamics shape a model’s expressive capacity.

Technology Category

Application Category

📝 Abstract

Training stability is typically regarded as a prerequisite for reliable optimization in large language models. In this work, we analyze how stabilizing training dynamics affects the induced generation distribution. We show that under standard maximum likelihood training, stable parameter trajectories lead stationary solutions to approximately minimize the forward KL divergence to the empirical distribution, while implicitly reducing generative entropy. As a consequence, the learned model can concentrate probability mass on a limited subset of empirical modes, exhibiting systematic degeneration despite smooth loss convergence. We empirically validate this effect using a controlled feedback-based training framework that stabilizes internal generation statistics, observing consistent low-entropy outputs and repetitive behavior across architectures and random seeds. It indicates that optimization stability and generative expressivity are not inherently aligned, and that stability alone is an insufficient indicator of generative quality.

Problem

Research questions and friction points this paper is trying to address.

stability

generative degeneration

entropy

large language models

training dynamics

Innovation

Methods, ideas, or system contributions that make the work stand out.

training stability

generative entropy

forward KL divergence