NTIRE 2025 XGC Quality Assessment Challenge: Methods and Results

📅 2025-06-03
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of authenticating and assessing perceptual quality across user-generated videos (UGVs), AI-generated videos (AIGVs), and talking-head videos (THVs). We introduce the first cross-source, no-reference video quality assessment benchmark. Our contribution comprises three novel datasets: FineVD-GC (fine-grained artifact annotation), Q-Eval-Video (multi-dimensional subjective quality scores), and THQA-NTIRE (talking-head-specific QA). Methodologically, we integrate multi-scale spatiotemporal modeling, artifact-aware neural networks, cross-domain transfer learning, and ensemble distillation to enable end-to-end quality prediction. All participating models across benchmark tracks surpass baseline performance; among over 1,000 valid submissions, 5–8 teams provided fully reproducible solutions. This significantly advances the state of the art in quality assessment for emerging generative video content, particularly AIGVs.

Technology Category

Application Category

📝 Abstract
This paper reports on the NTIRE 2025 XGC Quality Assessment Challenge, which will be held in conjunction with the New Trends in Image Restoration and Enhancement Workshop (NTIRE) at CVPR 2025. This challenge is to address a major challenge in the field of video and talking head processing. The challenge is divided into three tracks, including user generated video, AI generated video and talking head. The user-generated video track uses the FineVD-GC, which contains 6,284 user generated videos. The user-generated video track has a total of 125 registered participants. A total of 242 submissions are received in the development phase, and 136 submissions are received in the test phase. Finally, 5 participating teams submitted their models and fact sheets. The AI generated video track uses the Q-Eval-Video, which contains 34,029 AI-Generated Videos (AIGVs) generated by 11 popular Text-to-Video (T2V) models. A total of 133 participants have registered in this track. A total of 396 submissions are received in the development phase, and 226 submissions are received in the test phase. Finally, 6 participating teams submitted their models and fact sheets. The talking head track uses the THQA-NTIRE, which contains 12,247 2D and 3D talking heads. A total of 89 participants have registered in this track. A total of 225 submissions are received in the development phase, and 118 submissions are received in the test phase. Finally, 8 participating teams submitted their models and fact sheets. Each participating team in every track has proposed a method that outperforms the baseline, which has contributed to the development of fields in three tracks.
Problem

Research questions and friction points this paper is trying to address.

Assessing quality of user-generated videos using FineVD-GC dataset
Evaluating AI-generated videos from 11 T2V models via Q-Eval-Video
Measuring talking head video quality with THQA-NTIRE dataset
Innovation

Methods, ideas, or system contributions that make the work stand out.

FineVD-GC for user-generated video assessment
Q-Eval-Video for AI-generated video evaluation
THQA-NTIRE for talking head quality analysis
🔎 Similar Papers
No similar papers found.
X
Xiaohong Liu
X
Xiongkuo Min
Q
Qiang Hu
X
Xiaoyun Zhang
J
Jie Guo
Guangtao Zhai
Guangtao Zhai
Professor, IEEE Fellow, Shanghai Jiao Tong University
Multimedia Signal ProcessingVisual Quality AssessmentQoEAI EvaluationDisplays
S
Shushi Wang
Y
Yingjie Zhou
L
Lu Liu
J
Jingxin Li
L
Liu Yang
Farong Wen
Farong Wen
Student, Shanghai Jiao Tong University | Shanghai AI Laboratory
Intelligent Digital HumanLarge Lauguage ModelAI Evaluation
L
Li Xu
Y
Yanwei Jiang
X
Xilei Zhu
Chunyi Li
Chunyi Li
NTU | SJTU | Shanghai AI Lab
Generative AIEmbodied AILow-level Vision
Z
Zicheng Zhang
Huiyu Duan
Huiyu Duan
Shanghai Jiao Tong University
Multimedia Signal Processing
X
Xiele Wu
Y
Yixuan Gao
Yuqin Cao
Yuqin Cao
Shanghai Jiao Tong University
J
Jun Jia
W
Wei Sun
Jiezhang Cao
Jiezhang Cao
Harvard University | ETH Zürich
Image RestorationImage GenerationComputer Vision
R
R. Timofte
Baojun Li
Baojun Li
J
Jiamian Huang
D
Dan Luo
T
Tao Liu
W
Weixia Zhang
B
Bingkun Zheng
J
Junlin Chen
R
Ruikai Zhou
Meiya Chen
Meiya Chen
Y
Yu Wang
H
Hao Jiang
X
Xiantao Li
Y
Yu-Xi Jiang
J
Jun Tang
Y
Yimeng Zhao
B
Bo Hu
Z
Zelu Qi
C
Chaoyang Zhang
F
Fei Zhao
P
Ping Shi
L
Li Fu
H
Heng Cong
S
Shuai He
R
Rongyu Zhang
J
Jiarong He
Z
Zongyao Hu
W
Wei Luo
Zihao Yu
Zihao Yu
University of Science and Technology of China
Fengbin Guan
Fengbin Guan
University of Science and Technology of China
Image/Video Quality Assessment VLM
Yiting Lu
Yiting Lu
University of Science and Technology of China
VLM,Self-evolving Agent,Reasoning Model
X
Xin Li
Z
Zhibo Chen
M
Meng-Yuan Su
Y
Yi Wang
T
Tuo Chen
Chunxiao Li
Chunxiao Li
University of Science and Technology of China
User BehaviorFintechFinancial InclusionPlatform Economy
S
Shuaiyu Zhao
Jiaxin Wen
Jiaxin Wen
UC Berkeley
Alignment
C
Chuyi Lin
Sitong Liu
Sitong Liu
Duke University
N
Ningxin Chu
J
Jing Wan
Y
Yu Zhou
Baoying Chen
Baoying Chen
Alibaba Group
deepfake detectionAI-generated image detectionmedia forensics
Jishen Zeng
Jishen Zeng
Research Scientist
image forensics in real-world scenariosmultimedia securityand information hiding
Jiarui Liu
Jiarui Liu
Carnegie Mellon University
Natural Language Processing
X
Xianjin Liu
X
Xin Chen
L
Lanzhi Zhou
H
Hangyu Li
Y
You Han
B
Bibo Xiang
Z
Zhenjie Liu
J
Jianzhang Lu
J
Jialin Gui
R
Renjie Lu
S
Shangfei Wang
Donghao Zhou
Donghao Zhou
The Chinese University of Hong Kong
Machine LearningComputer Vision
J
Jingyu Lin
Q
Quanjian Song
J
Jiancheng Huang
Y
Yufeng Yang
C
Chan Wang
S
Shupeng Zhong
Y
Yang Yang
Lihuo He
Lihuo He
Professor, Xidian University
Image/Video Quality AssessmentVisual Perception
J
Jia Liu
Yu Xing
Yu Xing
RWTH Aachen University
Social networksEstimationSystem identification
T
Tida Fang
Y
Yuchun Jin