Learning Tractable Distributions Of Language Model Continuations

📅 2025-11-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the challenge of effectively modeling sequence-level constraints—such as grammaticality, stylistic consistency, and safety—in autoregressive language generation. We propose Learning to Look Ahead (LTLA), a framework that synergistically integrates a lightweight, tractable proxy model (e.g., an HMM) with a large language model (LLM). LTLA leverages the LLM’s contextual encoding capability to dynamically condition the hidden-state prior of the proxy model, enabling efficient and precise modeling of continuation distributions. By employing batched HMM updates and cross-prefix computation reuse, LTLA significantly reduces inference overhead. Compared to weakly context-aware baselines, LTLA maintains high generation fluency while substantially improving constraint satisfaction rates and conditional likelihood across grammatical correctness, stylistic alignment, and safety filtering tasks. Moreover, the framework is extensible to vision-language joint modeling scenarios.

Technology Category

Application Category

📝 Abstract
Controlled language generation conditions text on sequence-level constraints (for example, syntax, style, or safety). These constraints may depend on future tokens, which makes directly conditioning an autoregressive language model (LM) generally intractable. Prior work uses tractable surrogates such as hidden Markov models (HMMs) to approximate the distribution over continuations and adjust the model's next-token logits at decoding time. However, we find that these surrogates are often weakly context aware, which reduces query quality. We propose Learning to Look Ahead (LTLA), a hybrid approach that pairs the same base language model for rich prefix encoding with a fixed tractable surrogate model that computes exact continuation probabilities. Two efficiency pitfalls arise when adding neural context: (i) naively rescoring the prefix with every candidate next token requires a sweep over the entire vocabulary at each step, and (ii) predicting fresh surrogate parameters for each prefix, although tractable at a single step, forces recomputation of future probabilities for every new prefix and eliminates reuse. LTLA avoids both by using a single batched HMM update to account for all next-token candidates at once, and by conditioning only the surrogate's latent state prior on the LM's hidden representations while keeping the surrogate decoder fixed, so computations can be reused across prefixes. Empirically, LTLA attains higher conditional likelihood than an unconditional HMM, approximates continuation distributions for vision-language models where a standalone HMM cannot encode visual context, and improves constraint satisfaction at comparable fluency on controlled-generation tasks, with minimal inference overhead.
Problem

Research questions and friction points this paper is trying to address.

Autoregressive language models struggle with future-dependent sequence constraints
Existing surrogate models lack context awareness and reduce query quality
Efficient neural context integration requires avoiding vocabulary-sized rescoring sweeps
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid model pairs LM with tractable surrogate
Single batched HMM update processes all tokens
Conditioned latent state enables computation reuse
🔎 Similar Papers
No similar papers found.