Hierarchical Clustering for Conditional Diffusion in Image Generation

πŸ“… 2024-10-22
πŸ›οΈ arXiv.org
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Traditional VAEs struggle to simultaneously achieve high-fidelity sample generation and meaningful latent clustering structure in generative clustering tasks. This paper proposes TreeDiffusion, a two-stage framework that jointly optimizes hierarchical clustering with conditional diffusion modeling: first, a VAE learns a hierarchical cluster structure and extracts cluster embeddings; second, these embeddings condition a diffusion model to generate high-fidelity, cluster-specific images. TreeDiffusion is the first method to enable end-to-end joint training of hierarchical clustering and diffusion models, thereby preserving clustering interpretability while overcoming the generative quality limitations of VAEs. It supports cluster-level controllable generation and representation visualization. On multiple benchmark datasets, TreeDiffusion reduces FID by 22% and significantly improves intra-cluster consistency; qualitative experiments confirm concurrent enhancements in both generation quality and clustering interpretability.

Technology Category

Application Category

πŸ“ Abstract
Finding clusters of data points with similar characteristics and generating new cluster-specific samples can significantly enhance our understanding of complex data distributions. While clustering has been widely explored using Variational Autoencoders, these models often lack generation quality in real-world datasets. This paper addresses this gap by introducing TreeDiffusion, a deep generative model that conditions Diffusion Models on hierarchical clusters to obtain high-quality, cluster-specific generations. The proposed pipeline consists of two steps: a VAE-based clustering model that learns the hierarchical structure of the data, and a conditional diffusion model that generates realistic images for each cluster. We propose this two-stage process to ensure that the generated samples remain representative of their respective clusters and enhance image fidelity to the level of diffusion models. A key strength of our method is its ability to create images for each cluster, providing better visualization of the learned representations by the clustering model, as demonstrated through qualitative results. This method effectively addresses the generative limitations of VAE-based approaches while preserving their clustering performance. Empirically, we demonstrate that conditioning diffusion models on hierarchical clusters significantly enhances generative performance, thereby advancing the state of generative clustering models.
Problem

Research questions and friction points this paper is trying to address.

Enhancing generative quality in clustering with diffusion models
Integrating hierarchical cluster representations into image generation
Overcoming VAE limitations through cluster-conditioned diffusion modeling
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical VAE clustering learns latent data structure
Cluster-aware diffusion model generates conditioned images
Combining VAE structure with diffusion improves generation quality
πŸ”Ž Similar Papers
No similar papers found.
J
Jorge da Silva Goncalves
Department of Computer Science, ETH Zurich, Zurich, Switzerland
Laura Manduchi
Laura Manduchi
PhD student, ETH ZΓΌrich
deep learningprobabilistic modellingclusteringsemi-supervised representation learning
Moritz Vandenhirtz
Moritz Vandenhirtz
PhD student, ETH Zurich
Generative ModelingInterpretable Machine LearningComputer VisionMedical Data Science
J
Julia E. Vogt
Department of Computer Science, ETH Zurich, Zurich, Switzerland