Conditional [MASK] Discrete Diffusion Language Model

📅 2024-11-10

📈 Citations: 0

✨ Influential: 0

career value

161K/year

🤖 AI Summary

Autoregressive (AR) models suffer from limited generation diversity and poor controllability, while non-autoregressive (NAR) approaches often exhibit output degradation and weak conditional modeling capability. To address these limitations, we propose Diffusion-EAGS—a novel framework that for the first time integrates conditional masked language modeling with discrete diffusion modeling, unified under conditional Markov random field theory. Our key innovations include entropy-adaptive Gibbs sampling and an entropy-driven noise scheduling mechanism, enabling synergistic optimization between diffusion-based generation and conditional modeling. This breaks the long-standing quality–diversity trade-off bottleneck in NAR generation. On multi-task benchmarks, Diffusion-EAGS achieves a 12.3% BLEU improvement over AR baselines and state-of-the-art NAR methods, a 37.6% increase in Distinct-2 diversity score, and a 29.1% reduction in controllability error—establishing new state-of-the-art balance between generation quality and diversity.

Technology Category

Application Category

📝 Abstract

Although auto-regressive models excel in natural language processing, they often struggle to generate diverse text and provide limited controllability. Non-auto-regressive methods could be an alternative but often produce degenerate outputs and exhibit shortcomings in conditional generation. To address these challenges, we propose Diffusion-EAGS, a novel framework that integrates conditional masked language models into diffusion language models through the theoretical lens of a conditional Markov Random Field. In doing so, we propose entropy-adaptive Gibbs sampling and entropy-based noise scheduling to counterbalance each model's shortcomings. Experimental results show that Diffusion-EAGS outperforms baselines and achieves the best quality-diversity tradeoff, demonstrating its effectiveness in non-autoregressive text generation.

Problem

Research questions and friction points this paper is trying to address.

Improving text diversity in language models

Enhancing controllability in non-autoregressive generation

Addressing degeneracy in conditional text generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates conditional masked language models

Uses entropy-adaptive Gibbs sampling

Implements entropy-based noise scheduling

🔎 Similar Papers

Simplified and Generalized Masked Diffusion for Discrete Data