PoGDiff: Product-of-Gaussians Diffusion Models for Imbalanced Text-to-Image Generation

📅 2025-02-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Diffusion models suffer significant degradation in generation quality on class-imbalanced datasets, primarily due to insufficient representation of minority-class samples in image-text pairs. To address this, we propose PoGDiff—a general fine-tuning framework that requires neither resampling nor architectural modification. PoGDiff innovatively incorporates the Product of Gaussians (PoG) distribution into the diffusion training objective, enabling conditional prediction based on semantically proximal text embeddings to mitigate label bias. It integrates DDPM, a CLIP-based text encoder, and a KL-divergence-optimized variant for end-to-end optimization. Evaluated on multiple real-world imbalanced benchmarks, PoGDiff consistently improves FID (average reduction of 12.3%), CLIP-Score (+8.7%), and minority-class fidelity and generation accuracy—achieving superior visual quality and semantic consistency simultaneously.

Technology Category

Application Category

📝 Abstract
Diffusion models have made significant advancements in recent years. However, their performance often deteriorates when trained or fine-tuned on imbalanced datasets. This degradation is largely due to the disproportionate representation of majority and minority data in image-text pairs. In this paper, we propose a general fine-tuning approach, dubbed PoGDiff, to address this challenge. Rather than directly minimizing the KL divergence between the predicted and ground-truth distributions, PoGDiff replaces the ground-truth distribution with a Product of Gaussians (PoG), which is constructed by combining the original ground-truth targets with the predicted distribution conditioned on a neighboring text embedding. Experiments on real-world datasets demonstrate that our method effectively addresses the imbalance problem in diffusion models, improving both generation accuracy and quality.
Problem

Research questions and friction points this paper is trying to address.

Address imbalance in text-to-image generation
Improve diffusion model performance
Enhance generation accuracy and quality
Innovation

Methods, ideas, or system contributions that make the work stand out.

Product-of-Gaussians replaces KL divergence
Combines ground-truth with predicted distribution
Improves accuracy and generation quality
🔎 Similar Papers
No similar papers found.
Z
Ziyan Wang
Sizhe Wei
Sizhe Wei
Georgia Institute of Technology
Robotics
X
Xiao Huo
H
Hao Wang