Diffusion Autoencoders for Few-shot Image Generation in Hyperbolic Space

📅 2024-11-27

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

Few-shot image generation faces challenges in simultaneously ensuring class consistency and image diversity, while maintaining fine-grained, interpretable attribute controllability. To address this, we propose the first diffusion-based autoencoder framework embedded in hyperbolic space, which leverages hyperbolic geometry to model the hierarchical semantic structure of image–text pairs. By modulating the radius of the Poincaré disk, our method enables fine-grained, interpretable control over semantic diversity. The architecture integrates semantic priors from pretrained multimodal models (e.g., CLIP), a variational encoder–decoder backbone, and a hyperbolic diffusion process—enabling high-fidelity, diverse, and text-guided generation from only a few examples. Extensive experiments demonstrate that our approach significantly outperforms state-of-the-art methods across multiple few-shot benchmarks, achieving, for the first time, a unified balance among generation quality, controllability, and interpretability.

Technology Category

Application Category

📝 Abstract

Few-shot image generation aims to generate diverse and high-quality images for an unseen class given only a few examples in that class. However, existing methods often suffer from a trade-off between image quality and diversity while offering limited control over the attributes of newly generated images. In this work, we propose Hyperbolic Diffusion Autoencoders (HypDAE), a novel approach that operates in hyperbolic space to capture hierarchical relationships among images and texts from seen categories. By leveraging pre-trained foundation models, HypDAE generates diverse new images for unseen categories with exceptional quality by varying semantic codes or guided by textual instructions. Most importantly, the hyperbolic representation introduces an additional degree of control over semantic diversity through the adjustment of radii within the hyperbolic disk. Extensive experiments and visualizations demonstrate that HypDAE significantly outperforms prior methods by achieving a superior balance between quality and diversity with limited data and offers a highly controllable and interpretable generation process.

Problem

Research questions and friction points this paper is trying to address.

Balancing category consistency and image diversity

Providing control over attributes of generated images

Generating hierarchical few-shot images with limited data

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hyperbolic space captures hierarchical image relationships

Adjusts radii in hyperbolic disk for semantic diversity control

Leverages pre-trained models for few-shot generation quality

🔎 Similar Papers

MediSyn: A Generalist Text-Guided Latent Diffusion Model For Diverse Medical Image Synthesis