Stop-Think-AutoRegress: Language Modeling with Latent Diffusion Planning

πŸ“… 2026-02-24
πŸ“ˆ Citations: 2
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Traditional autoregressive language models suffer from a lack of global semantic planning due to their token-by-token generation paradigm, often compromising coherence and commonsense reasoning in long-form text. This work proposes STAR-LDM, a novel architecture that integrates latent diffusion models into the autoregressive generation process. By introducing a β€œpause-and-think” mechanism, STAR-LDM optimizes a global semantic plan in continuous latent space before producing discrete tokens. The approach enables fine-grained attribute control without requiring retraining and significantly outperforms same-scale baselines on language understanding benchmarks. Evaluations using LLM-as-judge demonstrate that STAR-LDM achieves over 70% win rates in narrative coherence and commonsense reasoning, striking a superior balance between controllability and fluency.

Technology Category

Application Category

πŸ“ Abstract
The Stop-Think-AutoRegress Language Diffusion Model (STAR-LDM) integrates latent diffusion planning with autoregressive generation. Unlike conventional autoregressive language models limited to token-by-token decisions, STAR-LDM incorporates a"thinking"phase that pauses generation to refine a semantic plan through diffusion before continuing. This enables global planning in continuous space prior to committing to discrete tokens. Evaluations show STAR-LDM significantly outperforms similar-sized models on language understanding benchmarks and achieves $>70\%$ win rates in LLM-as-judge comparisons for narrative coherence and commonsense reasoning. The architecture also allows straightforward control through lightweight classifiers, enabling fine-grained steering of attributes without model retraining while maintaining better fluency-control trade-offs than specialized approaches.
Problem

Research questions and friction points this paper is trying to address.

autoregressive language models
global planning
narrative coherence
commonsense reasoning
token-by-token generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

latent diffusion planning
autoregressive generation
global semantic planning
controllable text generation
language diffusion model
πŸ”Ž Similar Papers
No similar papers found.
Justin Lovelace
Justin Lovelace
PhD Student at Cornell University
Machine LearningNatural Language Processing
C
Christian Belardi
Department of Computer Science, Cornell University
S
Sofian Zalouk
Department of Computer Science, Cornell University
A
Adhitya Polavaram
Department of Computer Science, Cornell University
S
Srivatsa Kundurthy
Department of Computer Science, Cornell University
K
Kilian Q. Weinberger
Department of Computer Science, Cornell University