Rare-to-Frequent: Unlocking Compositional Generation Power of Diffusion Models on Rare Concepts with LLM Guidance

📅 2024-10-29

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

172K/year

🤖 AI Summary

Existing text-to-image (T2I) models struggle to generate rare concept compositions (e.g., “transparent ceramic cat”). This paper proposes R2F, a training-free framework that leverages large language models (LLMs) during inference to dynamically map rare concepts onto semantically related high-frequency ones—enabling implicit knowledge distillation and semantic guidance. R2F integrates LLM-driven semantic planning, multi-step diffusion sampling guidance, region-conditioned control, and frequency-aware concept transfer. It is the first work to empirically demonstrate that exposure to frequent concepts enhances generation capability for rare compositions. R2F is training-free, model-agnostic, and compatible with mainstream diffusion architectures. On three benchmarks—including RareBench—R2F improves T2I alignment by up to 28.1 percentage points over SD3.0 and FLUX, significantly boosting fidelity and plausibility of rare concept combinations.

Technology Category

Application Category

📝 Abstract

State-of-the-art text-to-image (T2I) diffusion models often struggle to generate rare compositions of concepts, e.g., objects with unusual attributes. In this paper, we show that the compositional generation power of diffusion models on such rare concepts can be significantly enhanced by the Large Language Model (LLM) guidance. We start with empirical and theoretical analysis, demonstrating that exposing frequent concepts relevant to the target rare concepts during the diffusion sampling process yields more accurate concept composition. Based on this, we propose a training-free approach, R2F, that plans and executes the overall rare-to-frequent concept guidance throughout the diffusion inference by leveraging the abundant semantic knowledge in LLMs. Our framework is flexible across any pre-trained diffusion models and LLMs, and can be seamlessly integrated with the region-guided diffusion approaches. Extensive experiments on three datasets, including our newly proposed benchmark, RareBench, containing various prompts with rare compositions of concepts, R2F significantly surpasses existing models including SD3.0 and FLUX by up to 28.1%p in T2I alignment. Code is available at https://github.com/krafton-ai/Rare-to-Frequent.

Problem

Research questions and friction points this paper is trying to address.

Text-to-image generation

Rare objects

Unusual features

Innovation

Methods, ideas, or system contributions that make the work stand out.

R2F

Large Language Models

Text-to-Image Synthesis

🔎 Similar Papers

Diffusion Models: A Comprehensive Survey of Methods and Applications