TEncDM: Understanding the Properties of the Diffusion Model in the Space of Language Model Encodings

📅 2024-02-29

📈 Citations: 1

✨ Influential: 0

career value

190K/year

🤖 AI Summary

This work addresses the limited modeling capacity of diffusion models in static embedding spaces for non-autoregressive text generation. We propose TEncDM—the first diffusion model operating directly in the dynamic encoding space of pretrained language models (PLMs). Methodologically, we transfer the diffusion process to the context-aware latent space of PLMs and design a lightweight Transformer decoder with self-conditioning prediction to enable zero-shot conditional generation. Key contributions include: (i) the first deep integration of diffusion modeling with PLM encoding spaces; (ii) context-enhanced autoregressive-style decoding; and (iii) an adjustable noise scheduling strategy. Experiments demonstrate that TEncDM consistently outperforms existing non-autoregressive diffusion models on QQP, XSum, and Wiki-Auto, achieving significant improvements in BLEU, ROUGE, and semantic consistency under zero-shot settings.

Technology Category

Application Category

📝 Abstract

This paper presents the Text Encoding Diffusion Model (TEncDM), a novel approach to diffusion modeling that operates in the space of pre-trained language model encodings. In contrast to traditionally used embeddings, encodings integrate contextual information. In our approach, we also employ a transformer-based decoder, specifically designed to incorporate context in the token prediction process. We conduct a comprehensive examination of the influence of the encoder, decoder, noise scheduler, and self-conditioning on zero-shot generation. Furthermore, we compare TEncDM with previous approaches on three conditional text generation tasks: QQP, XSum, and Wiki-Auto. The results show that TEncDM exhibits superior performance compared to existing non-autoregressive diffusion models. Our code is available at https://github.com/M0RJIQUE/tencdm.

Problem

Research questions and friction points this paper is trying to address.

Improves zero-shot text generation

Enhances conditional text generation tasks

Operates in pre-trained language model encodings

Innovation

Methods, ideas, or system contributions that make the work stand out.

TEncDM leverages pre-trained encodings

Employs transformer-based decoder

Improves zero-shot generation performance

🔎 Similar Papers

DiffuseDef: Improved Robustness to Adversarial Attacks