TEncDM: Understanding the Properties of the Diffusion Model in the Space of Language Model Encodings

πŸ“… 2024-02-29
πŸ“ˆ Citations: 1
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the limited modeling capacity of diffusion models in static embedding spaces for non-autoregressive text generation. We propose TEncDMβ€”the first diffusion model operating directly in the dynamic encoding space of pretrained language models (PLMs). Methodologically, we transfer the diffusion process to the context-aware latent space of PLMs and design a lightweight Transformer decoder with self-conditioning prediction to enable zero-shot conditional generation. Key contributions include: (i) the first deep integration of diffusion modeling with PLM encoding spaces; (ii) context-enhanced autoregressive-style decoding; and (iii) an adjustable noise scheduling strategy. Experiments demonstrate that TEncDM consistently outperforms existing non-autoregressive diffusion models on QQP, XSum, and Wiki-Auto, achieving significant improvements in BLEU, ROUGE, and semantic consistency under zero-shot settings.

Technology Category

Application Category

πŸ“ Abstract
This paper presents the Text Encoding Diffusion Model (TEncDM), a novel approach to diffusion modeling that operates in the space of pre-trained language model encodings. In contrast to traditionally used embeddings, encodings integrate contextual information. In our approach, we also employ a transformer-based decoder, specifically designed to incorporate context in the token prediction process. We conduct a comprehensive examination of the influence of the encoder, decoder, noise scheduler, and self-conditioning on zero-shot generation. Furthermore, we compare TEncDM with previous approaches on three conditional text generation tasks: QQP, XSum, and Wiki-Auto. The results show that TEncDM exhibits superior performance compared to existing non-autoregressive diffusion models. Our code is available at https://github.com/M0RJIQUE/tencdm.
Problem

Research questions and friction points this paper is trying to address.

Improves zero-shot text generation
Enhances conditional text generation tasks
Operates in pre-trained language model encodings
Innovation

Methods, ideas, or system contributions that make the work stand out.

TEncDM leverages pre-trained encodings
Employs transformer-based decoder
Improves zero-shot generation performance
πŸ”Ž Similar Papers
A
Alexander Shabalin
HSE University
V
Viacheslav Meshchaninov
Moscow State University
T
Tingir Badmaev
HSE University
Dmitry Molchanov
Dmitry Molchanov
Independent researcher, Budva
Grigory Bartosh
Grigory Bartosh
PhD candidate at University of Amsterdam
Deep LearningGenerative ModelsDiffusion Models
S
Sergey Markov
SberDevices
Dmitry Vetrov
Dmitry Vetrov
Professor of Computer Science, Constructor University
Deep learningBayesian inferenceGraphical models
E
Egor Chimbulatov
V
Vladislav Lapikov
R
Roman Kim