VQualA 2025 Challenge on Engagement Prediction for Short Videos: Methods and Results

📅 2025-09-02

📈 Citations: 0

✨ Influential: 0

career value

225K/year

🤖 AI Summary

This study addresses the problem of predicting user engagement with short-form social media videos, focusing on modeling key determinants of UGC popularity. We propose a novel multimodal deep learning framework that jointly encodes visual and audio features of videos along with creator metadata, leveraging real-world platform interaction logs to construct supervision signals. Distinct from prior work, our approach explicitly models dynamic cross-modal interactions and user intent bias, thereby enhancing both interpretability and generalizability of engagement prediction. Empirical evaluation on a large-scale real-world dataset demonstrates statistically significant improvements over unimodal baselines and state-of-the-art multimodal methods. The associated open research challenge attracted 97 researchers and yielded 15 high-quality submissions, advancing both theoretical understanding and practical deployment of short-video user behavior modeling.

Technology Category

Application Category

📝 Abstract

This paper presents an overview of the VQualA 2025 Challenge on Engagement Prediction for Short Videos, held in conjunction with ICCV 2025. The challenge focuses on understanding and modeling the popularity of user-generated content (UGC) short videos on social media platforms. To support this goal, the challenge uses a new short-form UGC dataset featuring engagement metrics derived from real-world user interactions. This objective of the Challenge is to promote robust modeling strategies that capture the complex factors influencing user engagement. Participants explored a variety of multi-modal features, including visual content, audio, and metadata provided by creators. The challenge attracted 97 participants and received 15 valid test submissions, contributing significantly to progress in short-form UGC video engagement prediction.

Problem

Research questions and friction points this paper is trying to address.

Predicting engagement for short user-generated videos

Modeling popularity using multi-modal features

Understanding factors influencing user interaction metrics

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-modal features including visual, audio, metadata

New short-form UGC dataset with engagement metrics

Robust modeling strategies for user engagement prediction

🔎 Similar Papers

Unboxing Engagement in YouTube Influencer Videos: An Attention-Based Approach