NTIRE 2025 challenge on Text to Image Generation Model Quality Assessment

📅 2025-05-22
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of fine-grained quality assessment for text-to-image (T2I) generation models, this work proposes a dual-dimensional decoupled evaluation paradigm: (1) image–text semantic alignment and (2) pixel-level structural distortion detection. We introduce two dedicated benchmarks—EvalMuse-40K, comprising 40K diverse image–text pairs spanning outputs from multiple T2I models, and EvalMuse-Structure, the first multi-model benchmark with pixel-level distortion annotations. Methodologically, we integrate multi-scale feature alignment modeling, structure-sensitive visual perception metrics, and a cross-model generalization evaluation framework. In both evaluation tracks, all top-performing solutions significantly outperform baseline methods. Empirical validation involving 20 participating teams confirms the paradigm’s strong effectiveness and robustness in diagnosing real-world T2I models.

Technology Category

Application Category

📝 Abstract
This paper reports on the NTIRE 2025 challenge on Text to Image (T2I) generation model quality assessment, which will be held in conjunction with the New Trends in Image Restoration and Enhancement Workshop (NTIRE) at CVPR 2025. The aim of this challenge is to address the fine-grained quality assessment of text-to-image generation models. This challenge evaluates text-to-image models from two aspects: image-text alignment and image structural distortion detection, and is divided into the alignment track and the structural track. The alignment track uses the EvalMuse-40K, which contains around 40K AI-Generated Images (AIGIs) generated by 20 popular generative models. The alignment track has a total of 371 registered participants. A total of 1,883 submissions are received in the development phase, and 507 submissions are received in the test phase. Finally, 12 participating teams submitted their models and fact sheets. The structure track uses the EvalMuse-Structure, which contains 10,000 AI-Generated Images (AIGIs) with corresponding structural distortion mask. A total of 211 participants have registered in the structure track. A total of 1155 submissions are received in the development phase, and 487 submissions are received in the test phase. Finally, 8 participating teams submitted their models and fact sheets. Almost all methods have achieved better results than baseline methods, and the winning methods in both tracks have demonstrated superior prediction performance on T2I model quality assessment.
Problem

Research questions and friction points this paper is trying to address.

Assessing fine-grained quality of text-to-image generation models
Evaluating image-text alignment in generated images
Detecting structural distortions in AI-generated images
Innovation

Methods, ideas, or system contributions that make the work stand out.

Evaluates T2I models via alignment and structure tracks
Uses EvalMuse-40K dataset for alignment assessment
Employs EvalMuse-Structure dataset for distortion detection
🔎 Similar Papers
No similar papers found.
S
Shuhao Han
H
Haotian Fan
F
Fangyuan Kong
W
Wenjie Liao
Chunle Guo
Chunle Guo
Nankai University
Deep LearningImage Enhancement
Chongyi Li
Chongyi Li
Professor, Nankai University
Computer VisionComputational ImagingComputational PhotographyUnderwater Imaging
Radu Timofte
Radu Timofte
Humboldt Professor for AI and Computer Vision, University of Würzburg
Computer VisionMachine LearningAICompressionComputational Photography
L
Liang Li
T
Tao Li
J
Junhui Cui
Y
Yunqiu Wang
Y
Yang Tai
J
Jingwei Sun
Jianhui Sun
Jianhui Sun
University of Virginia
Data MiningOptimizationDeep Learning
X
Xinli Yue
T
Tianyi Wang
H
Huan Hou
Junda Lu
Junda Lu
X
Xinyang Huang
Zitang Zhou
Zitang Zhou
Beijing University of Posts and Telecommunications
multimodallarge language models
Z
Zijian Zhang
X
Xuhui Zheng
X
Xuecheng Wu
Chong Peng
Chong Peng
Qingdao University
机器学习、计算机视觉
Xuezhi Cao
Xuezhi Cao
Meituan
Data MiningKnowledge GraphLLMs
T
Trong-Hieu Nguyen-Mau
M
Minh-Hoang Le
M
Minh-Khoa Le-Phan
D
Duy-Nam Ly
H
Hai-Dang Nguyen
Minh-Triet Tran
Minh-Triet Tran
University of Science & John von Neumann Institute, VNU-HCM
Cryptography and SecurityMultimedia and InteractionComputer Vision and Machine LearningSoftware Engineering
Y
Yukang Lin
Y
Yan Hong
C
Chuanbiao Song
S
Siyuan Li
Jun Lan
Jun Lan
Ant Group
Zhichao Zhang
Zhichao Zhang
School of Mathematics and Statistics, NUIST
Graph Signal ProcessingGraph Neural NetworkImage Processing
X
Xinyue Li
W
Wei Sun
Z
Zicheng Zhang
Y
Yunhao Li
X
Xiaohong Liu
Guangtao Zhai
Guangtao Zhai
Professor, IEEE Fellow, Shanghai Jiao Tong University
Multimedia Signal ProcessingVisual Quality AssessmentQoEAI EvaluationDisplays
Zitong Xu
Zitong Xu
Shanghai Jiao Tong University
Image Quality AssessmentImage Editing
Huiyu Duan
Huiyu Duan
Shanghai Jiao Tong University
Multimedia Signal Processing
J
Jiarui Wang
G
Guangji Ma
L
Liu Yang
L
Lu Liu
Q
Qiang Hu
X
Xiongkuo Min
Z
Zichuan Wang
Z
Zhenchen Tang
B
Bo Peng
J
Jing Dong
Fengbin Guan
Fengbin Guan
University of Science and Technology of China
Image/Video Quality Assessment VLM
Zihao Yu
Zihao Yu
University of Science and Technology of China
Yiting Lu
Yiting Lu
University of Science and Technology of China
VLM,Self-evolving Agent,Reasoning Model
W
Wei Luo
X
Xin Li
M
Minhao Lin
H
Haofeng Chen
X
Xuanxuan He
K
Kele Xu
Q
Qisheng Xu
Z
Zijian Gao
T
Tianjiao Wan
B
Bo-Cheng Qiu
Chih-Chung Hsu
Chih-Chung Hsu
Associate Professor of Institute of Intelligent Systems, College of AI, NYCU
Deep learningImage processingcomputer visionimage compressionvideo editing
C
Chia-ming Lee
Yu-Fan Lin
Yu-Fan Lin
Institute of Data Science, National Cheng Kung University, Taiwan
computer visiondeep learningmultimodal learning
B
Bo Yu
Z
Zehao Wang
D
Da Mu
M
Mingxiu Chen
J
Junkang Fang
H
Huamei Sun
W
Wending Zhao
Z
Zhiyu Wang
W
Wang Liu
Weikang Yu
Weikang Yu
PhD Researcher, HZDR & Technical University of Munich
Remote sensingDeep learning
P
Puhong Duan
B
Bin Sun
X
Xudong Kang
S
Shutao Li
S
Shuai He
L
Lingzhi Fu
H
Heng Cong
R
Rongyu Zhang
J
Jiarong He
Z
Zhishan Qiao
Y
Yongqing Huang
Z
Zewen Chen
Z
Zhe Pang
J
Juan Wang
Jianmin Guo
Jianmin Guo
Tsinghua University
software testingdeep learning systemsadversarial training
Z
Zhizhuo Shao
Z
Ziyu Feng
B
Bing Li
W
Weiming Hu
H
Hesong Li
D
Dehua Liu
Z
Zeming Liu
Q
Qingsong Xie
Ruichen Wang
Ruichen Wang
University of Maryland, College Park
Wireless CommunicationmmWave CommunicationPropagation modeling
Z
Zhihao Li
Yuqi Liang
Yuqi Liang
HKBU
J
Jianqi Bi
J
Jun Luo
J
Junfeng Yang
C
Can Li
Jing Fu
Jing Fu
H
Hongwei Xu
M
Mingrui Long
L
Lulin Tang