NTIRE 2025 XGC Quality Assessment Challenge: Methods and Results

📅 2025-06-03

📈 Citations: 1

✨ Influential: 0

career value

206K/year

🤖 AI Summary

This work addresses the challenge of authenticating and assessing perceptual quality across user-generated videos (UGVs), AI-generated videos (AIGVs), and talking-head videos (THVs). We introduce the first cross-source, no-reference video quality assessment benchmark. Our contribution comprises three novel datasets: FineVD-GC (fine-grained artifact annotation), Q-Eval-Video (multi-dimensional subjective quality scores), and THQA-NTIRE (talking-head-specific QA). Methodologically, we integrate multi-scale spatiotemporal modeling, artifact-aware neural networks, cross-domain transfer learning, and ensemble distillation to enable end-to-end quality prediction. All participating models across benchmark tracks surpass baseline performance; among over 1,000 valid submissions, 5–8 teams provided fully reproducible solutions. This significantly advances the state of the art in quality assessment for emerging generative video content, particularly AIGVs.

Technology Category

Application Category

📝 Abstract

This paper reports on the NTIRE 2025 XGC Quality Assessment Challenge, which will be held in conjunction with the New Trends in Image Restoration and Enhancement Workshop (NTIRE) at CVPR 2025. This challenge is to address a major challenge in the field of video and talking head processing. The challenge is divided into three tracks, including user generated video, AI generated video and talking head. The user-generated video track uses the FineVD-GC, which contains 6,284 user generated videos. The user-generated video track has a total of 125 registered participants. A total of 242 submissions are received in the development phase, and 136 submissions are received in the test phase. Finally, 5 participating teams submitted their models and fact sheets. The AI generated video track uses the Q-Eval-Video, which contains 34,029 AI-Generated Videos (AIGVs) generated by 11 popular Text-to-Video (T2V) models. A total of 133 participants have registered in this track. A total of 396 submissions are received in the development phase, and 226 submissions are received in the test phase. Finally, 6 participating teams submitted their models and fact sheets. The talking head track uses the THQA-NTIRE, which contains 12,247 2D and 3D talking heads. A total of 89 participants have registered in this track. A total of 225 submissions are received in the development phase, and 118 submissions are received in the test phase. Finally, 8 participating teams submitted their models and fact sheets. Each participating team in every track has proposed a method that outperforms the baseline, which has contributed to the development of fields in three tracks.

Problem

Research questions and friction points this paper is trying to address.

Assessing quality of user-generated videos using FineVD-GC dataset

Evaluating AI-generated videos from 11 T2V models via Q-Eval-Video

Measuring talking head video quality with THQA-NTIRE dataset

Innovation

Methods, ideas, or system contributions that make the work stand out.

FineVD-GC for user-generated video assessment

Q-Eval-Video for AI-generated video evaluation

THQA-NTIRE for talking head quality analysis

🔎 Similar Papers

No similar papers found.