Distribution Prompting: Understanding the Expressivity of Language Models Through the Next-Token Distributions They Can Produce

📅 2025-05-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work systematically investigates the expressivity boundaries of autoregressive language models (LMs) in inducing arbitrary next-token probability distributions via prompt engineering. Method: We propose a unified framework for soft and hard gradient-based prompt optimization, minimizing KL divergence to search for prompts that best approximate target distributions. Contribution/Results: We establish that both distributional entropy and the presence of “outlier tokens” jointly determine inducibility: low- and high-entropy distributions are more readily approximated; medium-entropy distributions containing outliers significantly outperform uniform ones; and the model’s native distribution exhibits minimal reconstruction error. Cross-tokenizer experiments show that distributions generated by same-architecture LMs achieve, on average, 42% lower KL error than random baselines—demonstrating structured, architecture-dependent constraints on expressivity. Our findings provide both theoretical foundations and empirical evidence for understanding the probabilistic modeling capacity of LMs.

Technology Category

Application Category

📝 Abstract
Autoregressive neural language models (LMs) generate a probability distribution over tokens at each time step given a prompt. In this work, we attempt to systematically understand the probability distributions that LMs can produce, showing that some distributions are significantly harder to elicit than others. Specifically, for any target next-token distribution over the vocabulary, we attempt to find a prompt that induces the LM to output a distribution as close as possible to the target, using either soft or hard gradient-based prompt tuning. We find that (1) in general, distributions with very low or very high entropy are easier to approximate than those with moderate entropy; (2) among distributions with the same entropy, those containing ''outlier tokens'' are easier to approximate; (3) target distributions generated by LMs -- even LMs with different tokenizers -- are easier to approximate than randomly chosen targets. These results offer insights into the expressiveness of LMs and the challenges of using them as probability distribution proposers.
Problem

Research questions and friction points this paper is trying to address.

Understanding LM expressivity via next-token distributions
Assessing difficulty in eliciting specific LM distributions
Comparing LM-generated vs random target distribution approximation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses gradient-based prompt tuning for distributions
Analyzes LM expressivity via next-token distributions
Compares target distribution approximation difficulties
🔎 Similar Papers
No similar papers found.