When Are Two Scores Better Than One? Investigating Ensembles of Diffusion Models

📅 2026-01-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates whether ensemble methods can enhance the generation quality of unconditional score-based diffusion models, with a particular focus on perceptual metrics such as FID. We systematically evaluate strategies including deep ensembles, Monte Carlo Dropout, and random forests across CIFAR-10, FFHQ, and tabular datasets. Our findings indicate that while ensembling consistently improves score-matching loss and likelihood, it does not necessarily lead to better perceptual quality. We further elucidate the theoretical mechanism underlying score addition, establishing connections to model composition techniques like guidance, and identify an optimal aggregation strategy specifically effective for tabular data.

Technology Category

Application Category

📝 Abstract
Diffusion models now generate high-quality, diverse samples, with an increasing focus on more powerful models. Although ensembling is a well-known way to improve supervised models, its application to unconditional score-based diffusion models remains largely unexplored. In this work we investigate whether it provides tangible benefits for generative modelling. We find that while ensembling the scores generally improves the score-matching loss and model likelihood, it fails to consistently enhance perceptual quality metrics such as FID on image datasets. We confirm this observation across a breadth of aggregation rules using Deep Ensembles, Monte Carlo Dropout, on CIFAR-10 and FFHQ. We attempt to explain this discrepancy by investigating possible explanations, such as the link between score estimation and image quality. We also look into tabular data through random forests, and find that one aggregation strategy outperforms the others. Finally, we provide theoretical insights into the summing of score models, which shed light not only on ensembling but also on several model composition techniques (e.g. guidance).
Problem

Research questions and friction points this paper is trying to address.

diffusion models
ensembling
score-based models
generative modeling
perceptual quality
Innovation

Methods, ideas, or system contributions that make the work stand out.

diffusion models
score-based generative models
model ensembling
perceptual quality
score aggregation
🔎 Similar Papers
No similar papers found.
R
Raphael Razafindralambo
Université Côte d’Azur, Inria, CNRS, I3S, Maasai, Nice, France
R
R'emy Sun
Université Côte d’Azur, Inria, CNRS, I3S, Maasai, Nice, France
F
F. Precioso
Université Côte d’Azur, Inria, CNRS, I3S, Maasai, Nice, France
Damien Garreau
Damien Garreau
Professor for the Theory of Machine Learning, Julius-Maximilians-Universität Würzburg
Explainable AIensembleschange-point detectioncomparison-based learning
Pierre-Alexandre Mattei
Pierre-Alexandre Mattei
Research scientist, Inria, Université Côte d'Azur
StatisticsMachine learningLatent variable modelsDeep generative models