Segment-Level Diffusion: A Framework for Controllable Long-Form Generation with Diffusion Language Models

📅 2024-12-15
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address poor coherence, contextual inaccuracy, and limited scalability of diffusion language models in long-text generation, this paper proposes a segment-level diffusion framework: input text is partitioned into semantically coherent segments, each independently modeled in latent space and jointly decoded. We introduce segments—intermediate between token-level and document-level units—as the fundamental diffusion unit, decoupling diffusion-based prediction from autoregressive decoding to balance controllability and fluency. To enhance robustness of segment representations, we incorporate adversarial learning and contrastive learning, and design a conditional latent-space guidance mechanism. Extensive experiments on four benchmarks—including XSum and ROCStories—demonstrate significant improvements over state-of-the-art diffusion and autoregressive baselines in both automatic and human evaluations, achieving new SOTA performance on fluency, coherence, and contextual consistency metrics.

Technology Category

Application Category

📝 Abstract
Diffusion models have shown promise in text generation but often struggle with generating long, coherent, and contextually accurate text. Token-level diffusion overlooks word-order dependencies and enforces short output windows, while passage-level diffusion struggles with learning robust representation for long-form text. To address these challenges, we propose Segment-Level Diffusion (SLD), a framework that enhances diffusion-based text generation through text segmentation, robust representation training with adversarial and contrastive learning, and improved latent-space guidance. By segmenting long-form outputs into separate latent representations and decoding them with an autoregressive decoder, SLD simplifies diffusion predictions and improves scalability. Experiments on XSum, ROCStories, DialogSum, and DeliData demonstrate that SLD achieves competitive or superior performance in fluency, coherence, and contextual compatibility across automatic and human evaluation metrics comparing with other diffusion and autoregressive baselines. Ablation studies further validate the effectiveness of our segmentation and representation learning strategies.
Problem

Research questions and friction points this paper is trying to address.

Improving long-form text coherence in diffusion models
Addressing weak word-order dependencies in token-level diffusion
Enhancing scalability and robustness for long-form generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Segment-Level Diffusion for long-form text
Adversarial and contrastive learning training
Autoregressive decoder for latent representations