Text Generation Beyond Discrete Token Sampling

📅 2025-05-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In conventional autoregressive generation, large language models (LLMs) discard the full output token distribution after sampling a discrete token at each step, resulting in irreversible information loss that degrades generation quality and reasoning capability. To address this, we propose Mixture of Inputs (MoI), a training-free method that constructs continuous input representations by weighting and fusing the sampled token’s embedding with the posterior expectation vector of its output distribution—thereby preserving and leveraging distributional information throughout generation. MoI introduces zero trainable parameters and incurs negligible computational overhead, enabling, for the first time, end-to-end retention and utilization of distributional information during inference. Evaluated on challenging tasks—including mathematical reasoning, code generation, and doctoral-level question answering—MoI consistently improves performance across diverse models such as QwQ-32B and Nemotron-Super-49B, without requiring fine-tuning, retraining, or architectural modification.

Technology Category

Application Category

📝 Abstract
In standard autoregressive generation, an LLM predicts the next-token distribution, samples a discrete token, and then discards the distribution, passing only the sampled token as new input. To preserve this distribution's rich information, we propose Mixture of Inputs (MoI), a training-free method for autoregressive generation. After generating a token following the standard paradigm, we construct a new input that blends the generated discrete token with the previously discarded token distribution. Specifically, we employ a Bayesian estimation method that treats the token distribution as the prior, the sampled token as the observation, and replaces the conventional one-hot vector with the continuous posterior expectation as the new model input. MoI allows the model to maintain a richer internal representation throughout the generation process, resulting in improved text quality and reasoning capabilities. On mathematical reasoning, code generation, and PhD-level QA tasks, MoI consistently improves performance across multiple models including QwQ-32B, Nemotron-Super-49B, Gemma-3-27B, and DAPO-Qwen-32B, with no additional training and negligible computational overhead.
Problem

Research questions and friction points this paper is trying to address.

Improving text generation by preserving next-token distribution information
Enhancing autoregressive models without additional training overhead
Boosting performance in reasoning and QA tasks via continuous inputs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Blends discrete tokens with token distributions
Uses Bayesian estimation for continuous inputs
Improves text quality without extra training
🔎 Similar Papers
No similar papers found.