Stop-Think-AutoRegress: Language Modeling with Latent Diffusion Planning

📅 2026-02-24

📈 Citations: 2

✨ Influential: 0

career value

192K/year

🤖 AI Summary

Traditional autoregressive language models suffer from a lack of global semantic planning due to their token-by-token generation paradigm, often compromising coherence and commonsense reasoning in long-form text. This work proposes STAR-LDM, a novel architecture that integrates latent diffusion models into the autoregressive generation process. By introducing a “pause-and-think” mechanism, STAR-LDM optimizes a global semantic plan in continuous latent space before producing discrete tokens. The approach enables fine-grained attribute control without requiring retraining and significantly outperforms same-scale baselines on language understanding benchmarks. Evaluations using LLM-as-judge demonstrate that STAR-LDM achieves over 70% win rates in narrative coherence and commonsense reasoning, striking a superior balance between controllability and fluency.

Technology Category

Application Category

📝 Abstract

The Stop-Think-AutoRegress Language Diffusion Model (STAR-LDM) integrates latent diffusion planning with autoregressive generation. Unlike conventional autoregressive language models limited to token-by-token decisions, STAR-LDM incorporates a"thinking"phase that pauses generation to refine a semantic plan through diffusion before continuing. This enables global planning in continuous space prior to committing to discrete tokens. Evaluations show STAR-LDM significantly outperforms similar-sized models on language understanding benchmarks and achieves $>70\%$ win rates in LLM-as-judge comparisons for narrative coherence and commonsense reasoning. The architecture also allows straightforward control through lightweight classifiers, enabling fine-grained steering of attributes without model retraining while maintaining better fluency-control trade-offs than specialized approaches.

Problem

Research questions and friction points this paper is trying to address.

autoregressive language models

global planning

narrative coherence

commonsense reasoning

token-by-token generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

latent diffusion planning

autoregressive generation

global semantic planning