Breaking the Likelihood-Quality Trade-off in Diffusion Models by Merging Pretrained Experts

📅 2025-11-24

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

Diffusion models face an inherent trade-off between perceptual quality and data likelihood: high-noise denoising prioritizes visual fidelity at the cost of lower likelihood, whereas low-noise optimization improves likelihood but degrades image quality. To address this, we propose a plug-and-play dual-expert sampling framework that requires no fine-tuning. During denoising, it dynamically switches between two pre-trained diffusion models—designated as the “quality expert” and the “likelihood expert”—based on noise level: the quality expert operates at high noise levels to establish global structure, while the likelihood expert takes over at low noise levels to refine pixel-level statistics. To our knowledge, this is the first method to jointly improve both perceptual quality and likelihood. On CIFAR-10 and ImageNet32, our joint sampling achieves superior or competitive performance in both FID/Inception Score and log-likelihood compared to single-expert baselines, effectively breaking the conventional trade-off barrier.

Technology Category

Application Category

📝 Abstract

Diffusion models for image generation often exhibit a trade-off between perceptual sample quality and data likelihood: training objectives emphasizing high-noise denoising steps yield realistic images but poor likelihoods, whereas likelihood-oriented training overweights low-noise steps and harms visual fidelity. We introduce a simple plug-and-play sampling method that combines two pretrained diffusion experts by switching between them along the denoising trajectory. Specifically, we apply an image-quality expert at high noise levels to shape global structure, then switch to a likelihood expert at low noise levels to refine pixel statistics. The approach requires no retraining or fine-tuning -- only the choice of an intermediate switching step. On CIFAR-10 and ImageNet32, the merged model consistently matches or outperforms its base components, improving or preserving both likelihood and sample quality relative to each expert alone. These results demonstrate that expert switching across noise levels is an effective way to break the likelihood-quality trade-off in image diffusion models.

Problem

Research questions and friction points this paper is trying to address.

Breaking the trade-off between image quality and likelihood in diffusion models

Combining pretrained denoising experts at different noise levels

Improving both visual fidelity and pixel statistics without retraining

Innovation

Methods, ideas, or system contributions that make the work stand out.

Merging two pretrained diffusion experts

Switching between experts along denoising trajectory

No retraining required for improved performance

🔎 Similar Papers

Decouple-Then-Merge: Finetune Diffusion Models as Multi-Task Learning