DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation

📅 2024-06-24

🏛️ arXiv.org

📈 Citations: 17

✨ Influential: 2

career value

200K/year

🤖 AI Summary

Current automated evaluation of personalized image generation suffers from a fundamental trade-off: existing metrics exhibit low correlation with human preferences, while human evaluation remains costly and inefficient. To address this, we propose the first multimodal GPT-driven benchmark that achieves high alignment with human judgments. Our method introduces a task-reinforced, self-aligned prompting mechanism for GPT, wherein systematic prompt engineering and explicit human preference modeling jointly enable highly consistent evaluation outcomes relative to manual scoring. We further construct a high-quality, multi-scenario evaluation dataset. Extensive validation across seven state-of-the-art generative models demonstrates that our benchmark improves Spearman correlation with human ratings by over 42%, effectively overcoming both the inaccuracy of automated metrics and the inefficiency of human evaluation. This work advances the evaluation paradigm for generative AI.

Technology Category

Application Category

📝 Abstract

Personalized image generation holds great promise in assisting humans in everyday work and life due to its impressive ability to creatively generate personalized content across various contexts. However, current evaluations either are automated but misalign with humans or require human evaluations that are time-consuming and expensive. In this work, we present DreamBench++, a human-aligned benchmark that advanced multimodal GPT models automate. Specifically, we systematically design the prompts to let GPT be both human-aligned and self-aligned, empowered with task reinforcement. Further, we construct a comprehensive dataset comprising diverse images and prompts. By benchmarking 7 modern generative models, we demonstrate that DreamBench++ results in significantly more human-aligned evaluation, helping boost the community with innovative findings.

Problem

Research questions and friction points this paper is trying to address.

Develops a human-aligned benchmark for personalized image generation.

Addresses misalignment in automated evaluations with human preferences.

Reduces reliance on time-consuming and expensive human evaluations.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Automated human-aligned benchmark using GPT

Systematic prompt design for task reinforcement

Comprehensive dataset with diverse images and prompts

🔎 Similar Papers

A-Bench: Are LMMs Masters at Evaluating AI-generated Images?