Diffusion Autoencoders for Few-shot Image Generation in Hyperbolic Space

📅 2024-11-27
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Few-shot image generation faces challenges in simultaneously ensuring class consistency and image diversity, while maintaining fine-grained, interpretable attribute controllability. To address this, we propose the first diffusion-based autoencoder framework embedded in hyperbolic space, which leverages hyperbolic geometry to model the hierarchical semantic structure of image–text pairs. By modulating the radius of the Poincaré disk, our method enables fine-grained, interpretable control over semantic diversity. The architecture integrates semantic priors from pretrained multimodal models (e.g., CLIP), a variational encoder–decoder backbone, and a hyperbolic diffusion process—enabling high-fidelity, diverse, and text-guided generation from only a few examples. Extensive experiments demonstrate that our approach significantly outperforms state-of-the-art methods across multiple few-shot benchmarks, achieving, for the first time, a unified balance among generation quality, controllability, and interpretability.

Technology Category

Application Category

📝 Abstract
Few-shot image generation aims to generate diverse and high-quality images for an unseen class given only a few examples in that class. However, existing methods often suffer from a trade-off between image quality and diversity while offering limited control over the attributes of newly generated images. In this work, we propose Hyperbolic Diffusion Autoencoders (HypDAE), a novel approach that operates in hyperbolic space to capture hierarchical relationships among images and texts from seen categories. By leveraging pre-trained foundation models, HypDAE generates diverse new images for unseen categories with exceptional quality by varying semantic codes or guided by textual instructions. Most importantly, the hyperbolic representation introduces an additional degree of control over semantic diversity through the adjustment of radii within the hyperbolic disk. Extensive experiments and visualizations demonstrate that HypDAE significantly outperforms prior methods by achieving a superior balance between quality and diversity with limited data and offers a highly controllable and interpretable generation process.
Problem

Research questions and friction points this paper is trying to address.

Balancing category consistency and image diversity
Providing control over attributes of generated images
Generating hierarchical few-shot images with limited data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hyperbolic space captures hierarchical image relationships
Adjusts radii in hyperbolic disk for semantic diversity control
Leverages pre-trained models for few-shot generation quality
🔎 Similar Papers
No similar papers found.