🤖 AI Summary
Autoregressive (AR) models suffer from limited generation diversity and poor controllability, while non-autoregressive (NAR) approaches often exhibit output degradation and weak conditional modeling capability. To address these limitations, we propose Diffusion-EAGS—a novel framework that for the first time integrates conditional masked language modeling with discrete diffusion modeling, unified under conditional Markov random field theory. Our key innovations include entropy-adaptive Gibbs sampling and an entropy-driven noise scheduling mechanism, enabling synergistic optimization between diffusion-based generation and conditional modeling. This breaks the long-standing quality–diversity trade-off bottleneck in NAR generation. On multi-task benchmarks, Diffusion-EAGS achieves a 12.3% BLEU improvement over AR baselines and state-of-the-art NAR methods, a 37.6% increase in Distinct-2 diversity score, and a 29.1% reduction in controllability error—establishing new state-of-the-art balance between generation quality and diversity.
📝 Abstract
Although auto-regressive models excel in natural language processing, they often struggle to generate diverse text and provide limited controllability. Non-auto-regressive methods could be an alternative but often produce degenerate outputs and exhibit shortcomings in conditional generation. To address these challenges, we propose Diffusion-EAGS, a novel framework that integrates conditional masked language models into diffusion language models through the theoretical lens of a conditional Markov Random Field. In doing so, we propose entropy-adaptive Gibbs sampling and entropy-based noise scheduling to counterbalance each model's shortcomings. Experimental results show that Diffusion-EAGS outperforms baselines and achieves the best quality-diversity tradeoff, demonstrating its effectiveness in non-autoregressive text generation.