DiffExp: Efficient Exploration in Reward Fine-tuning for Text-to-Image Diffusion Models

📅 2025-02-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address inefficient exploration and slow convergence caused by online sampling in reward-based fine-tuning of text-to-image diffusion models, this paper proposes a dual-path exploration mechanism. First, it dynamically modulates the classifier-free guidance scale to enhance generative diversity; second, it introduces random reweighting sampling at the text-phrase level to precisely concentrate sampling on high-reward regions. This is the first work to jointly model dynamic guidance scheduling and phrase-level weight perturbation, significantly improving sample efficiency. Evaluated on mainstream frameworks—including DDPO and AlignProp—the method achieves an average 18.7% improvement in reward score under identical sample budgets, reduces training steps by 32%, and demonstrates faster convergence and superior generalization across multiple benchmarks.

Technology Category

Application Category

📝 Abstract
Fine-tuning text-to-image diffusion models to maximize rewards has proven effective for enhancing model performance. However, reward fine-tuning methods often suffer from slow convergence due to online sample generation. Therefore, obtaining diverse samples with strong reward signals is crucial for improving sample efficiency and overall performance. In this work, we introduce DiffExp, a simple yet effective exploration strategy for reward fine-tuning of text-to-image models. Our approach employs two key strategies: (a) dynamically adjusting the scale of classifier-free guidance to enhance sample diversity, and (b) randomly weighting phrases of the text prompt to exploit high-quality reward signals. We demonstrate that these strategies significantly enhance exploration during online sample generation, improving the sample efficiency of recent reward fine-tuning methods, such as DDPO and AlignProp.
Problem

Research questions and friction points this paper is trying to address.

Enhance sample diversity in reward fine-tuning
Improve convergence speed of diffusion models
Optimize text-to-image model performance efficiently
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic classifier-free guidance scaling
Random text prompt phrase weighting
Enhanced exploration in online generation
🔎 Similar Papers
No similar papers found.