How Alignment Shrinks the Generative Horizon

📅 2025-06-21

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Post-alignment large language models (LLMs) exhibit significantly reduced output diversity and enhanced generation stability, yet the underlying mechanisms remain unclear. Method: We propose the *branching factor* (BF)—a novel metric quantifying the diversity of generative paths—and conduct probabilistic concentration analysis, prompt perturbation experiments, and chain-of-thought (CoT) behavioral modeling to investigate how alignment reshapes generation dynamics. Contribution/Results: We find that alignment does not alter the model’s intrinsic capabilities but instead biases it toward inherently low-entropy, high-determinacy generation trajectories already present in the base model. Empirically, BF drops sharply from 12 to 1.2 after alignment, indicating a dramatic collapse in path diversity. Furthermore, CoT reasoning amplifies this effect by extending the inference chain into increasingly deterministic regimes. This work provides both a theoretical foundation and an interpretable, quantitative tool for understanding and controllably modulating LLM generation stability.

Technology Category

Application Category

📝 Abstract

Despite their impressive capabilities, aligned large language models (LLMs) often generate outputs that lack diversity. What drives this stability in the generation? We investigate this phenomenon through the lens of probability concentration in the model's output distribution. To quantify this concentration, we introduce the Branching Factor (BF) -- a token-invariant measure of the effective number of plausible next steps during generation. Our empirical analysis reveals two key findings: (1) BF often decreases as generation progresses, suggesting that LLMs become more predictable as they generate. (2) alignment tuning substantially sharpens the model's output distribution from the outset, reducing BF by nearly an order of magnitude (e.g., from 12 to 1.2) relative to base models. This stark reduction helps explain why aligned models often appear less sensitive to decoding strategies. Building on this insight, we find this stability has surprising implications for complex reasoning. Aligned Chain-of-Thought (CoT) models (e.g., DeepSeek-distilled models), for instance, leverage this effect; by generating longer reasoning chains, they push generation into later, more deterministic (lower BF) stages, resulting in more stable outputs. We hypothesize that alignment tuning does not fundamentally change a model's behavior, but instead steers it toward stylistic tokens (e.g., "Sure") that unlock low-entropy trajectories already present in the base model. This view is supported by nudging experiments, which show that prompting base models with such tokens can similarly reduce BF. Together, our findings establish BF as a powerful diagnostic for understanding and controlling LLM outputs - clarifying how alignment reduces variability, how CoT promotes stable generations, and how base models can be steered away from diversity.

Problem

Research questions and friction points this paper is trying to address.

Aligned LLMs lack diversity in generated outputs

Alignment tuning sharpens output distribution, reducing variability

Branching Factor measures predictability in model generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces Branching Factor (BF) metric

Alignment tuning sharpens output distribution

CoT leverages deterministic generation stages

🔎 Similar Papers

No similar papers found.

Authors to Follow