How Many Different Outputs Can a Transformer Generate?

📅 2026-05-21

📈 Citations: 0

✨ Influential: 0

career value

180K/year

🤖 AI Summary

This work investigates the number of distinct output sequences a Transformer model can generate given a prompt and uncovers the fundamental reasons behind its failures in simple tasks such as copying and memorization. Through rigorous theoretical analysis, the study establishes—for the first time—that the maximum length of accessible sequences grows linearly with prompt length, while the fraction of accessible sequences decays exponentially beyond a critical length, a phenomenon that persists even with unlimited context and computational resources. Combining upper-bound derivations, asymptotic analysis, and experiments across multiple architectures, the proposed theory remains tightly aligned with empirical observations across model scales, with error factors below 10, thereby offering a precise characterization of the expressive limitations inherent to Transformers.

📝 Abstract

We study how we can leverage only a handful of characteristics of a transformer's architecture to closely predict the number of different sequences it can output, both qualitatively and quantitatively. We provide an upper bound depending on the length of the prompt, which we show empirically to be tight up to a factor less than 10, across architectures and model sizes. Our analysis also provides a theoretical explanation for previously observed empirical failures of transformers on simple sequence tasks, such as copying and cramming. Formally, we prove that (i) the maximal length of accessible sequences (those that the transformer can output for some prompt) grows linearly with the prompt length, (ii) beyond a critical threshold, the proportion of accessible sequences decays exponentially with sequence length, and (iii) the linear coefficient relating prompt length to accessible sequence length admits a theoretical upper bound. Notably, these results hold even with unbounded context and computation time.

Problem

Research questions and friction points this paper is trying to address.

transformer

output sequences

sequence generation

architectural limits

accessible sequences

Innovation

Methods, ideas, or system contributions that make the work stand out.

Transformer expressivity

sequence generation capacity

theoretical upper bound