🤖 AI Summary
Gaussian diffusion models struggle to effectively model distributions of discrete data represented as Dirac mixtures in continuous space, often producing samples that fall into low-density regions due to multimodality in the sampling density. This work is the first to uncover this failure mechanism and introduces a novel paradigm that dynamically switches between sampling strategies within critical intervals by integrating self-conditioning with q-sampling. The approach is grounded in a stochastic hierarchical analysis of DDPM solver behavior and demonstrates substantial improvements in sample quality across both conditional and unconditional generation tasks in domains such as text, code, and proteins.
📝 Abstract
Diffusion models have become a standard approach for generative modeling in continuous domains, yet their application to discrete data remains challenging. We investigate why Gaussian diffusion models with the DDPM solver struggle to sample from discrete distributions that are represented as a mixture of delta-distributions in the continuous space. Using a toy Random Hierarchy Model, we identify a critical sampling interval in which the density of noisified data becomes multimodal. In this regime, DDPM occasionally enters low-density regions between modes producing out-of-distribution inputs for the model and degrading sample quality. We show that existing heuristics, including self-conditioning and a solver we term q-sampling, help alleviate this issue. Furthermore, we demonstrate that combining self-conditioning with switching from DDPM to q-sampling within the critical interval improves generation quality on real data. We validate these findings across conditional and unconditional tasks in multiple domains, including text, programming code, and proteins.