NTIRE 2025 challenge on Text to Image Generation Model Quality Assessment

📅 2025-05-22

📈 Citations: 1

✨ Influential: 0

career value

192K/year

🤖 AI Summary

To address the challenge of fine-grained quality assessment for text-to-image (T2I) generation models, this work proposes a dual-dimensional decoupled evaluation paradigm: (1) image–text semantic alignment and (2) pixel-level structural distortion detection. We introduce two dedicated benchmarks—EvalMuse-40K, comprising 40K diverse image–text pairs spanning outputs from multiple T2I models, and EvalMuse-Structure, the first multi-model benchmark with pixel-level distortion annotations. Methodologically, we integrate multi-scale feature alignment modeling, structure-sensitive visual perception metrics, and a cross-model generalization evaluation framework. In both evaluation tracks, all top-performing solutions significantly outperform baseline methods. Empirical validation involving 20 participating teams confirms the paradigm’s strong effectiveness and robustness in diagnosing real-world T2I models.

Technology Category

Application Category

📝 Abstract

This paper reports on the NTIRE 2025 challenge on Text to Image (T2I) generation model quality assessment, which will be held in conjunction with the New Trends in Image Restoration and Enhancement Workshop (NTIRE) at CVPR 2025. The aim of this challenge is to address the fine-grained quality assessment of text-to-image generation models. This challenge evaluates text-to-image models from two aspects: image-text alignment and image structural distortion detection, and is divided into the alignment track and the structural track. The alignment track uses the EvalMuse-40K, which contains around 40K AI-Generated Images (AIGIs) generated by 20 popular generative models. The alignment track has a total of 371 registered participants. A total of 1,883 submissions are received in the development phase, and 507 submissions are received in the test phase. Finally, 12 participating teams submitted their models and fact sheets. The structure track uses the EvalMuse-Structure, which contains 10,000 AI-Generated Images (AIGIs) with corresponding structural distortion mask. A total of 211 participants have registered in the structure track. A total of 1155 submissions are received in the development phase, and 487 submissions are received in the test phase. Finally, 8 participating teams submitted their models and fact sheets. Almost all methods have achieved better results than baseline methods, and the winning methods in both tracks have demonstrated superior prediction performance on T2I model quality assessment.

Problem

Research questions and friction points this paper is trying to address.

Assessing fine-grained quality of text-to-image generation models

Evaluating image-text alignment in generated images

Detecting structural distortions in AI-generated images

Innovation

Methods, ideas, or system contributions that make the work stand out.

Evaluates T2I models via alignment and structure tracks

Uses EvalMuse-40K dataset for alignment assessment

Employs EvalMuse-Structure dataset for distortion detection

🔎 Similar Papers

Revisiting Text-to-Image Evaluation with Gecko: On Metrics, Prompts, and Human Ratings