🤖 AI Summary
To address the instability in score estimation across multiple noise levels and high sensitivity to hyperparameters in single-step generative modeling, this paper proposes the Score-of-Mixture Training (SMT) framework. SMT directly estimates the score function of the mixture distribution between real and generated samples across diverse noise levels by minimizing an α-skewed Jensen–Shannon divergence. This establishes the first principled paradigm for mixture-distribution score estimation. The method supports both from-scratch training and knowledge distillation from pretrained diffusion models—termed Score Mixture Distillation (SMD)—without requiring intricate noise scheduling. It exhibits strong robustness to hyperparameter variation and enhanced training stability. On CIFAR-10 and ImageNet 64×64, SMT achieves FID and Inception Score (IS) performance on par with or superior to state-of-the-art single-step methods, while significantly improving training efficiency and practical applicability.
📝 Abstract
We propose Score-of-Mixture Training (SMT), a novel framework for training one-step generative models by minimizing a class of divergences called the $alpha$-skew Jensen-Shannon divergence. At its core, SMT estimates the score of mixture distributions between real and fake samples across multiple noise levels. Similar to consistency models, our approach supports both training from scratch (SMT) and distillation using a pretrained diffusion model, which we call Score-of-Mixture Distillation (SMD). It is simple to implement, requires minimal hyperparameter tuning, and ensures stable training. Experiments on CIFAR-10 and ImageNet 64x64 show that SMT/SMD are competitive with and can even outperform existing methods.