Score-of-Mixture Training: Training One-Step Generative Models Made Simple

📅 2025-02-13

📈 Citations: 0

✨ Influential: 0

career value

224K/year

🤖 AI Summary

To address the instability in score estimation across multiple noise levels and high sensitivity to hyperparameters in single-step generative modeling, this paper proposes the Score-of-Mixture Training (SMT) framework. SMT directly estimates the score function of the mixture distribution between real and generated samples across diverse noise levels by minimizing an α-skewed Jensen–Shannon divergence. This establishes the first principled paradigm for mixture-distribution score estimation. The method supports both from-scratch training and knowledge distillation from pretrained diffusion models—termed Score Mixture Distillation (SMD)—without requiring intricate noise scheduling. It exhibits strong robustness to hyperparameter variation and enhanced training stability. On CIFAR-10 and ImageNet 64×64, SMT achieves FID and Inception Score (IS) performance on par with or superior to state-of-the-art single-step methods, while significantly improving training efficiency and practical applicability.

Technology Category

Application Category

📝 Abstract

We propose Score-of-Mixture Training (SMT), a novel framework for training one-step generative models by minimizing a class of divergences called the $alpha$-skew Jensen-Shannon divergence. At its core, SMT estimates the score of mixture distributions between real and fake samples across multiple noise levels. Similar to consistency models, our approach supports both training from scratch (SMT) and distillation using a pretrained diffusion model, which we call Score-of-Mixture Distillation (SMD). It is simple to implement, requires minimal hyperparameter tuning, and ensures stable training. Experiments on CIFAR-10 and ImageNet 64x64 show that SMT/SMD are competitive with and can even outperform existing methods.

Problem

Research questions and friction points this paper is trying to address.

training one-step generative models

minimizing α-skew Jensen-Shannon divergence

estimating score of mixture distributions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Minimizes α-skew Jensen-Shannon divergence

Estimates mixture distribution scores

Supports training and distillation processes

🔎 Similar Papers

ScoreFusion: fusing score-based generative models via Kullback-Leibler barycenters