IMAGINE-E: Image Generation Intelligence Evaluation of State-of-the-art Text-to-Image Models

📅 2025-01-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing text-to-image (T2I) evaluation methods inadequately assess generalization across novel domains and multi-task scenarios. Method: We propose IMAGINE-E—the first comprehensive, multidimensional T2I benchmark, evaluating six capabilities: structured generation, photorealism, physical consistency, domain-specific synthesis, challenging-scenario robustness, and multi-style creativity. It introduces a cross-modal, generalizability-oriented intelligent evaluation framework integrating controllable prompt engineering, multi-granularity human assessment, domain-expert annotation, and reproducible automated scripts—overcoming limitations of single-metric evaluation. Results: Systematic evaluation of six state-of-the-art models—including FLUX.1 and Ideogram 2.0—reveals their strengths in structured and domain-specialized tasks, alongside consistent weaknesses. IMAGINE-E provides the first rigorous capability boundary characterization and authoritative evaluation paradigm for advancing T2I models toward general-purpose AI infrastructure.

Technology Category

Application Category

📝 Abstract
With the rapid development of diffusion models, text-to-image(T2I) models have made significant progress, showcasing impressive abilities in prompt following and image generation. Recently launched models such as FLUX.1 and Ideogram2.0, along with others like Dall-E3 and Stable Diffusion 3, have demonstrated exceptional performance across various complex tasks, raising questions about whether T2I models are moving towards general-purpose applicability. Beyond traditional image generation, these models exhibit capabilities across a range of fields, including controllable generation, image editing, video, audio, 3D, and motion generation, as well as computer vision tasks like semantic segmentation and depth estimation. However, current evaluation frameworks are insufficient to comprehensively assess these models' performance across expanding domains. To thoroughly evaluate these models, we developed the IMAGINE-E and tested six prominent models: FLUX.1, Ideogram2.0, Midjourney, Dall-E3, Stable Diffusion 3, and Jimeng. Our evaluation is divided into five key domains: structured output generation, realism, and physical consistency, specific domain generation, challenging scenario generation, and multi-style creation tasks. This comprehensive assessment highlights each model's strengths and limitations, particularly the outstanding performance of FLUX.1 and Ideogram2.0 in structured and specific domain tasks, underscoring the expanding applications and potential of T2I models as foundational AI tools. This study provides valuable insights into the current state and future trajectory of T2I models as they evolve towards general-purpose usability. Evaluation scripts will be released at https://github.com/jylei16/Imagine-e.
Problem

Research questions and friction points this paper is trying to address.

Text-to-Image Generation
Model Evaluation
Performance Measurement
Innovation

Methods, ideas, or system contributions that make the work stand out.

IMAGINE-E
T2I Model Evaluation
Multifaceted Assessment
🔎 Similar Papers
No similar papers found.
J
Jiayi Lei
Shanghai Jiaotong University, Shanghai AI Laboratory
Renrui Zhang
Renrui Zhang
Seed ByteDance & MMLab & PKU
Large Multimodal ModelGenerative ModelEmbodied AI
X
Xiangfei Hu
Shanghai Jiaotong University, Shanghai AI Laboratory
Weifeng Lin
Weifeng Lin
The Chinese University of Hong Kong
Deep LearningComputer Vision
Z
Zhen Li
CUHK MMLab
W
Wenjian Sun
Shanghai Jiaotong University, Shanghai AI Laboratory
R
Ruoyi Du
Shanghai AI Laboratory
Le Zhuo
Le Zhuo
Krea AI
generative modelsmulti-modal learning
Z
Zhongyu Li
Shanghai AI Laboratory
X
Xinyue Li
Shanghai AI Laboratory
Shitian Zhao
Shitian Zhao
Shanghai AI Lab
LLMMLLMGenerative Model
Ziyu Guo
Ziyu Guo
The Chinese University of Hong Kong
Multi-modality LearningLLM/VLMs3D Vision
Yiting Lu
Yiting Lu
University of Science and Technology of China
VLM,Self-evolving Agent,Reasoning Model
P
Peng Gao
Shanghai AI Laboratory
H
Hongsheng Li
CUHK MMLab