🤖 AI Summary
To address the degradation in generation quality of DDIM sampling under low-step regimes, this work proposes replacing the standard Gaussian backward transition kernel with a moment-matching Gaussian Mixture Model (GMM), constrained solely by first- and second-order central moments for efficient distribution approximation. This is the first integration of moment-matching GMM into the DDIM framework—requiring no additional network parameters or extra training overhead. On ImageNet 256×256, our method achieves an FID of 6.94 (a 3.21 improvement) and an Inception Score of 207.85 (a 11.12 gain) using only 10 sampling steps, significantly outperforming the DDIM baseline. The approach combines theoretical elegance—rooted in moment-based distribution matching—with practical efficiency, offering a novel paradigm for accelerating diffusion model inference without compromising sample quality.
📝 Abstract
We propose using a Gaussian Mixture Model (GMM) as reverse transition operator (kernel) within the Denoising Diffusion Implicit Models (DDIM) framework, which is one of the most widely used approaches for accelerated sampling from pre-trained Denoising Diffusion Probabilistic Models (DDPM). Specifically we match the first and second order central moments of the DDPM forward marginals by constraining the parameters of the GMM. We see that moment matching is sufficient to obtain samples with equal or better quality than the original DDIM with Gaussian kernels. We provide experimental results with unconditional models trained on CelebAHQ and FFHQ and class-conditional models trained on ImageNet datasets respectively. Our results suggest that using the GMM kernel leads to significant improvements in the quality of the generated samples when the number of sampling steps is small, as measured by FID and IS metrics. For example on ImageNet 256x256, using 10 sampling steps, we achieve a FID of 6.94 and IS of 207.85 with a GMM kernel compared to 10.15 and 196.73 respectively with a Gaussian kernel.