🤖 AI Summary
Text diffusion models face a fundamental tension between discreteness and semantic continuity: Gaussian diffusion in continuous latent spaces preserves semantics but impedes precise token decoding, whereas simplex-space modeling respects discreteness at the cost of severing semantic relationships. To address this, we propose Semantic-Guided Embedding Smoothing Diffusion (SESD), the first method to incorporate semantic similarity directly into the diffusion process over the embedding space. SESD enables structured noise injection and reparameterized decoding within a continuous space, thereby jointly optimizing semantic coherence and token decodability. Empirical evaluation across multiple seq2seq tasks demonstrates substantial improvements over existing text diffusion models. Ablation studies confirm that SESD’s diffusion space—guided by semantic structure—outperforms both standard embedding spaces and classification simplex spaces in terms of both generation quality and semantic fidelity.
📝 Abstract
Diffusion models have achieved state-of-the-art performance in generating images, audio, and video, but their adaptation to text remains challenging due to its discrete nature. Prior approaches either apply Gaussian diffusion in continuous latent spaces, which inherits semantic structure but struggles with token decoding, or operate in categorical simplex space, which respect discreteness but disregard semantic relation between tokens. In this paper, we propose Smoothing Diffusion on Token Embeddings (Smoothie), a novel diffusion method that combines the strengths of both approaches by progressively smoothing token embeddings based on semantic similarity. This technique enables gradual information removal while maintaining a natural decoding process. Experimental results on several sequence-to-sequence generation tasks demonstrate that Smoothie outperforms existing diffusion-based models in generation quality. Furthermore, ablation studies show that our proposed diffusion space yields better performance than both the standard embedding space and the categorical simplex. Our code is available at https://github.com/ashaba1in/smoothie.