🤖 AI Summary
This work investigates whether frozen large language models (LLMs) can non-autoregressively generate high-fidelity, hundred-token–scale text in a single forward pass using only two learnable prompt embeddings. Methodologically, we propose an embedding-space optimization framework for prompt design, integrating geometric analysis of token embeddings with empirical evaluation across multiple mainstream LLMs. Our key contribution is the first empirical demonstration that frozen LLMs possess inherent multi-token joint generation capability: target text embeddings form a connected local region within the frozen model’s embedding space, revealing strong structural regularity and codability. Experiments achieve single-step generation of over 100 tokens, with BLEU and ROUGE scores significantly surpassing autoregressive and other non-autoregressive baselines. These findings open a new pathway toward lightweight text encoders and highly efficient generative paradigms.
📝 Abstract
A recent study showed that large language models (LLMs) can reconstruct surprisingly long texts - up to thousands of tokens - via autoregressive generation from just one specially trained input embedding. In this work, we explore whether such reconstruction is possible without autoregression. We show that frozen LLMs can generate hundreds of accurate tokens in just one forward pass, when provided with only two learned embeddings. This reveals a surprising and underexplored capability of LLMs - multi-token generation without iterative decoding. We investigate the behaviour of these embeddings and provide insight into the type of information they encode. We also empirically show that although these representations are not unique for a given text, they form connected and local regions in embedding space - a property that suggests the potential of learning a dedicated encoder into that space.