Exploring the Latent Capacity of LLMs for One-Step Text Generation

📅 2025-05-27

📈 Citations: 0

✨ Influential: 0

career value

209K/year

🤖 AI Summary

This work investigates whether frozen large language models (LLMs) can non-autoregressively generate high-fidelity, hundred-token–scale text in a single forward pass using only two learnable prompt embeddings. Methodologically, we propose an embedding-space optimization framework for prompt design, integrating geometric analysis of token embeddings with empirical evaluation across multiple mainstream LLMs. Our key contribution is the first empirical demonstration that frozen LLMs possess inherent multi-token joint generation capability: target text embeddings form a connected local region within the frozen model’s embedding space, revealing strong structural regularity and codability. Experiments achieve single-step generation of over 100 tokens, with BLEU and ROUGE scores significantly surpassing autoregressive and other non-autoregressive baselines. These findings open a new pathway toward lightweight text encoders and highly efficient generative paradigms.

Technology Category

Application Category

📝 Abstract

A recent study showed that large language models (LLMs) can reconstruct surprisingly long texts - up to thousands of tokens - via autoregressive generation from just one specially trained input embedding. In this work, we explore whether such reconstruction is possible without autoregression. We show that frozen LLMs can generate hundreds of accurate tokens in just one forward pass, when provided with only two learned embeddings. This reveals a surprising and underexplored capability of LLMs - multi-token generation without iterative decoding. We investigate the behaviour of these embeddings and provide insight into the type of information they encode. We also empirically show that although these representations are not unique for a given text, they form connected and local regions in embedding space - a property that suggests the potential of learning a dedicated encoder into that space.

Problem

Research questions and friction points this paper is trying to address.

Exploring non-autoregressive text generation in LLMs

Investigating multi-token generation from learned embeddings

Analyzing embedding space properties for text representation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Frozen LLMs generate tokens in one pass

Two learned embeddings enable multi-token generation

Embeddings form connected regions in latent space

🔎 Similar Papers

No similar papers found.