π€ AI Summary
Traditional autoregressive language models suffer from a lack of global semantic planning due to their token-by-token generation paradigm, often compromising coherence and commonsense reasoning in long-form text. This work proposes STAR-LDM, a novel architecture that integrates latent diffusion models into the autoregressive generation process. By introducing a βpause-and-thinkβ mechanism, STAR-LDM optimizes a global semantic plan in continuous latent space before producing discrete tokens. The approach enables fine-grained attribute control without requiring retraining and significantly outperforms same-scale baselines on language understanding benchmarks. Evaluations using LLM-as-judge demonstrate that STAR-LDM achieves over 70% win rates in narrative coherence and commonsense reasoning, striking a superior balance between controllability and fluency.
π Abstract
The Stop-Think-AutoRegress Language Diffusion Model (STAR-LDM) integrates latent diffusion planning with autoregressive generation. Unlike conventional autoregressive language models limited to token-by-token decisions, STAR-LDM incorporates a"thinking"phase that pauses generation to refine a semantic plan through diffusion before continuing. This enables global planning in continuous space prior to committing to discrete tokens. Evaluations show STAR-LDM significantly outperforms similar-sized models on language understanding benchmarks and achieves $>70\%$ win rates in LLM-as-judge comparisons for narrative coherence and commonsense reasoning. The architecture also allows straightforward control through lightweight classifiers, enabling fine-grained steering of attributes without model retraining while maintaining better fluency-control trade-offs than specialized approaches.