Automated Filtering of Human Feedback Data for Aligning Text-to-Image Diffusion Models

📅 2024-10-14
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address slow convergence in alignment training of text-to-image diffusion models caused by noisy and large-scale human feedback data, this paper proposes FiFA, an automatic filtering algorithm. FiFA jointly models three criteria—preference margin (estimated via a surrogate reward model), text safety (assessed by an LLM), and text diversity (quantified via k-NN entropy estimation)—into a unified optimization objective, enabling end-to-end, fully automated data curation without human intervention. By integrating importance-weighted sampling with direct preference optimization (DPO) in joint training, FiFA achieves a 17% improvement in human preference rate using only 0.5% of the original dataset, significantly enhances training stability, and reduces GPU time to 1% of the baseline. FiFA is highly scalable and applicable to arbitrarily large human feedback datasets, establishing a new paradigm for efficient and robust alignment learning.

Technology Category

Application Category

📝 Abstract
Fine-tuning text-to-image diffusion models with human feedback is an effective method for aligning model behavior with human intentions. However, this alignment process often suffers from slow convergence due to the large size and noise present in human feedback datasets. In this work, we propose FiFA, a novel automated data filtering algorithm designed to enhance the fine-tuning of diffusion models using human feedback datasets with direct preference optimization (DPO). Specifically, our approach selects data by solving an optimization problem to maximize three components: preference margin, text quality, and text diversity. The concept of preference margin is used to identify samples that are highly informative in addressing the noisy nature of feedback dataset, which is calculated using a proxy reward model. Additionally, we incorporate text quality, assessed by large language models to prevent harmful contents, and consider text diversity through a k-nearest neighbor entropy estimator to improve generalization. Finally, we integrate all these components into an optimization process, with approximating the solution by assigning importance score to each data pair and selecting the most important ones. As a result, our method efficiently filters data automatically, without the need for manual intervention, and can be applied to any large-scale dataset. Experimental results show that FiFA significantly enhances training stability and achieves better performance, being preferred by humans 17% more, while using less than 0.5% of the full data and thus 1% of the GPU hours compared to utilizing full human feedback datasets.
Problem

Research questions and friction points this paper is trying to address.

Automated filtering of noisy human feedback data for diffusion models
Enhancing fine-tuning with preference margin, text quality, and diversity
Improving training stability and performance with minimal data usage
Innovation

Methods, ideas, or system contributions that make the work stand out.

Automated data filtering algorithm FiFA
Maximizes preference margin, text quality, diversity
Uses proxy reward model for optimization
🔎 Similar Papers
No similar papers found.