Discrete Diffusion Language Model for Efficient Text Summarization

📅 2024-06-25
📈 Citations: 2
Influential: 0
📄 PDF
🤖 AI Summary
Discrete diffusion models suffer from performance degradation in conditional long-text generation (e.g., abstractive summarization) due to incompatibility between their noise scheduling mechanisms and backbone architectures. To address this, we propose a semantic-aware noise process and a CrossMamba encoder-decoder architecture—the first successful integration of the Mamba state-space model into discrete diffusion frameworks. Our method synergistically combines discrete diffusion modeling, semantics-guided stochastic absorption noise, a hybrid Transformer-Mamba backbone, and non-autoregressive generation. Evaluated on Gigaword, CNN/DailyMail, and ArXiv, our approach achieves state-of-the-art ROUGE scores while significantly accelerating inference over autoregressive baselines. Results demonstrate both superior effectiveness and computational efficiency for long-sequence generation tasks.

Technology Category

Application Category

📝 Abstract
While diffusion models excel at conditional generating high-quality images, prior works in discrete diffusion models were not evaluated on conditional long-text generation. In this work, we address the limitations of prior discrete diffusion models for conditional long-text generation, particularly in long sequence-to-sequence tasks such as abstractive summarization. Despite fast decoding speeds compared to autoregressive methods, previous diffusion models failed on the abstractive summarization task due to the incompatibility between the backbone architectures and the random noising process. To overcome these challenges, we introduce a novel semantic-aware noising process that enables Transformer backbones to handle long sequences effectively. Additionally, we propose CrossMamba, an adaptation of the Mamba model to the encoder-decoder paradigm, which integrates seamlessly with the random absorbing noising process. Our approaches achieve state-of-the-art performance on three benchmark summarization datasets: Gigaword, CNN/DailyMail, and Arxiv, outperforming existing discrete diffusion models on ROUGE metrics as well as possessing much faster speed in inference compared to autoregressive models.
Problem

Research questions and friction points this paper is trying to address.

Address limitations of discrete diffusion models for long-text generation.
Improve abstractive summarization with semantic-aware noising process.
Enhance speed and performance on benchmark summarization datasets.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Semantic-aware noising process for Transformer backbones
CrossMamba adaptation for encoder-decoder paradigm
State-of-the-art performance on summarization datasets
🔎 Similar Papers
No similar papers found.
Do Huu Dat
Do Huu Dat
VinUniversity
Machine Learning
D
Do Duc Anh
Nanyang Technological University
A
A. Luu
Nanyang Technological University
W
W. Buntine
VinUniversity