ScoreFusion: fusing score-based generative models via Kullback-Leibler barycenters

📅 2024-06-28

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

199K/year

🤖 AI Summary

Generative modeling under scarce observational data for target populations remains challenging. Method: This paper proposes a KL-divergence-optimal weighted fusion of multiple pre-trained diffusion models, introducing KL barycenter theory—first applied to diffusion model ensembling—to establish a learnable score-matching fusion framework with dimension-agnostic total variation error bounds. The method is compatible with mainstream architectures (e.g., Stable Diffusion) and supports efficient sampling adaptation. Results: Experiments demonstrate significantly improved sample efficiency on MNIST under few-shot settings; in portrait generation, it broadens distributional coverage and enhances facial diversity, outperforming individual auxiliary models. The core contributions are a theory-driven diffusion model fusion paradigm and a differentiable barycenter implementation enabling principled, scalable ensemble learning.

Technology Category

Application Category

📝 Abstract

We introduce ScoreFusion, a theoretically grounded method for fusing multiple pre-trained diffusion models that are assumed to generate from auxiliary populations. ScoreFusion is particularly useful for enhancing the generative modeling of a target population with limited observed data. Our starting point considers the family of KL barycenters of the auxiliary populations, which is proven to be an optimal parametric class in the KL sense, but difficult to learn. Nevertheless, by recasting the learning problem as score matching in denoising diffusion, we obtain a tractable way of computing the optimal KL barycenter weights. We prove a dimension-free sample complexity bound in total variation distance, provided that the auxiliary models are well-fitted for their own task and the auxiliary tasks combined capture the target well. The sample efficiency of ScoreFusion is demonstrated by learning handwritten digits. We also provide a simple adaptation of a Stable Diffusion denoising pipeline that enables sampling from the KL barycenter of two auxiliary checkpoints; on a portrait generation task, our method produces faces that enhance population heterogeneity relative to the auxiliary distributions.

Problem

Research questions and friction points this paper is trying to address.

Fusing multiple pre-trained diffusion models effectively

Enhancing generative modeling with limited target data

Computing optimal KL barycenter weights via score matching

Innovation

Methods, ideas, or system contributions that make the work stand out.

Fuses diffusion models via KL barycenters

Uses score matching for tractable learning

Enhances population heterogeneity in generation

🔎 Similar Papers

No similar papers found.