ICME 2025 Grand Challenge on Video Super-Resolution for Video Conferencing

📅 2025-06-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses low-latency causal video super-resolution (VSR) under H.265 compression for video conferencing, targeting perceptual quality enhancement of three LR video types: generic videos, portrait close-ups, and screen content. To this end, we propose a causal temporal propagation network, a lightweight spatio-temporal feature alignment mechanism, and an H.265 distortion modeling approach. We further introduce the first open-source, conference-oriented screen-content VSR dataset. Subjective evaluation follows ITU-T P.910 guidelines via crowdsourcing. Experiments demonstrate that our method—and top-performing competition solutions—significantly outperform bilinear interpolation and non-causal VSR baselines under strict low-latency constraints. The results advance the practical deployment of real-time VSR under realistic codec conditions.

Technology Category

Application Category

📝 Abstract
Super-Resolution (SR) is a critical task in computer vision, focusing on reconstructing high-resolution (HR) images from low-resolution (LR) inputs. The field has seen significant progress through various challenges, particularly in single-image SR. Video Super-Resolution (VSR) extends this to the temporal domain, aiming to enhance video quality using methods like local, uni-, bi-directional propagation, or traditional upscaling followed by restoration. This challenge addresses VSR for conferencing, where LR videos are encoded with H.265 at fixed QPs. The goal is to upscale videos by a specific factor, providing HR outputs with enhanced perceptual quality under a low-delay scenario using causal models. The challenge included three tracks: general-purpose videos, talking head videos, and screen content videos, with separate datasets provided by the organizers for training, validation, and testing. We open-sourced a new screen content dataset for the SR task in this challenge. Submissions were evaluated through subjective tests using a crowdsourced implementation of the ITU-T Rec P.910.
Problem

Research questions and friction points this paper is trying to address.

Enhance video quality for conferencing using super-resolution
Upscale low-resolution videos with H.265 encoding
Evaluate methods on diverse video tracks and datasets
Innovation

Methods, ideas, or system contributions that make the work stand out.

Upscale videos using causal models
H.265 encoded LR video enhancement
Open-sourced screen content dataset
🔎 Similar Papers
No similar papers found.
Babak Naderi
Babak Naderi
Microsoft
DNN in Quality of ExperienceVideo EnhancementVideo/Speech QoEStatistical Modeling
Ross Cutler
Ross Cutler
Microsoft
Computer VisionMachine LearningAcousticsOpticsVoIP
J
Juhee Cho
Microsoft Corporation
N
Nabakumar Khongbantabam
Microsoft Corporation
D
Dejan Ivkovic
Microsoft Corporation