🤖 AI Summary
This study addresses the problem of predicting user engagement with short-form social media videos, focusing on modeling key determinants of UGC popularity. We propose a novel multimodal deep learning framework that jointly encodes visual and audio features of videos along with creator metadata, leveraging real-world platform interaction logs to construct supervision signals. Distinct from prior work, our approach explicitly models dynamic cross-modal interactions and user intent bias, thereby enhancing both interpretability and generalizability of engagement prediction. Empirical evaluation on a large-scale real-world dataset demonstrates statistically significant improvements over unimodal baselines and state-of-the-art multimodal methods. The associated open research challenge attracted 97 researchers and yielded 15 high-quality submissions, advancing both theoretical understanding and practical deployment of short-video user behavior modeling.
📝 Abstract
This paper presents an overview of the VQualA 2025 Challenge on Engagement Prediction for Short Videos, held in conjunction with ICCV 2025. The challenge focuses on understanding and modeling the popularity of user-generated content (UGC) short videos on social media platforms. To support this goal, the challenge uses a new short-form UGC dataset featuring engagement metrics derived from real-world user interactions. This objective of the Challenge is to promote robust modeling strategies that capture the complex factors influencing user engagement. Participants explored a variety of multi-modal features, including visual content, audio, and metadata provided by creators. The challenge attracted 97 participants and received 15 valid test submissions, contributing significantly to progress in short-form UGC video engagement prediction.