Smoothie: Smoothing Diffusion on Token Embeddings for Text Generation

📅 2025-05-24

📈 Citations: 0

✨ Influential: 0

career value

179K/year

🤖 AI Summary

Text diffusion models face a fundamental tension between discreteness and semantic continuity: Gaussian diffusion in continuous latent spaces preserves semantics but impedes precise token decoding, whereas simplex-space modeling respects discreteness at the cost of severing semantic relationships. To address this, we propose Semantic-Guided Embedding Smoothing Diffusion (SESD), the first method to incorporate semantic similarity directly into the diffusion process over the embedding space. SESD enables structured noise injection and reparameterized decoding within a continuous space, thereby jointly optimizing semantic coherence and token decodability. Empirical evaluation across multiple seq2seq tasks demonstrates substantial improvements over existing text diffusion models. Ablation studies confirm that SESD’s diffusion space—guided by semantic structure—outperforms both standard embedding spaces and classification simplex spaces in terms of both generation quality and semantic fidelity.

Technology Category

Application Category

📝 Abstract

Diffusion models have achieved state-of-the-art performance in generating images, audio, and video, but their adaptation to text remains challenging due to its discrete nature. Prior approaches either apply Gaussian diffusion in continuous latent spaces, which inherits semantic structure but struggles with token decoding, or operate in categorical simplex space, which respect discreteness but disregard semantic relation between tokens. In this paper, we propose Smoothing Diffusion on Token Embeddings (Smoothie), a novel diffusion method that combines the strengths of both approaches by progressively smoothing token embeddings based on semantic similarity. This technique enables gradual information removal while maintaining a natural decoding process. Experimental results on several sequence-to-sequence generation tasks demonstrate that Smoothie outperforms existing diffusion-based models in generation quality. Furthermore, ablation studies show that our proposed diffusion space yields better performance than both the standard embedding space and the categorical simplex. Our code is available at https://github.com/ashaba1in/smoothie.

Problem

Research questions and friction points this paper is trying to address.

Adapting diffusion models to discrete text generation

Balancing semantic structure and token decoding in diffusion

Improving generation quality in sequence-to-sequence tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Smoothing token embeddings via semantic similarity

Combining continuous and categorical diffusion strengths

Maintaining natural decoding during information removal

🔎 Similar Papers

TEncDM: Understanding the Properties of the Diffusion Model in the Space of Language Model Encodings