MVP: Winning Solution to SMP Challenge 2025 Video Track

📅 2025-07-01

📈 Citations: 0

✨ Influential: 0

career value

187K/year

🤖 AI Summary

This study addresses the task of social media video popularity prediction by proposing a Multimodal Video Prediction (MVP) framework. Methodologically, MVP integrates deep visual features extracted from pretrained video models with user metadata and contextual information to construct robust multimodal representations; it further employs systematic preprocessing—including logarithmic transformation and outlier filtering—alongside a gradient-boosting regressor to effectively model cross-modal interactions and capture complex nonlinear patterns. Key contributions include: (i) an end-to-end multimodal fusion architecture that explicitly models synergistic effects among video content, user behavior, and situational context; and (ii) a robust, task-specific preprocessing paradigm tailored for popularity forecasting. Evaluated on the official SMP Challenge 2025 video track benchmark, MVP achieved first place, demonstrating substantial improvements in trend detection accuracy and recommendation reliability.

Technology Category

Application Category

📝 Abstract

Social media platforms serve as central hubs for content dissemination, opinion expression, and public engagement across diverse modalities. Accurately predicting the popularity of social media videos enables valuable applications in content recommendation, trend detection, and audience engagement. In this paper, we present Multimodal Video Predictor (MVP), our winning solution to the Video Track of the SMP Challenge 2025. MVP constructs expressive post representations by integrating deep video features extracted from pretrained models with user metadata and contextual information. The framework applies systematic preprocessing techniques, including log-transformations and outlier removal, to improve model robustness. A gradient-boosted regression model is trained to capture complex patterns across modalities. Our approach ranked first in the official evaluation of the Video Track, demonstrating its effectiveness and reliability for multimodal video popularity prediction on social platforms. The source code is available at https://anonymous.4open.science/r/SMPDVideo.

Problem

Research questions and friction points this paper is trying to address.

Predicting popularity of social media videos accurately

Integrating multimodal features for video representation

Improving robustness in popularity prediction models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates deep video features with metadata

Applies systematic preprocessing for robustness

Uses gradient-boosted regression for multimodal patterns

🔎 Similar Papers

Chrono: A Simple Blueprint for Representing Time in MLLMs

2024-06-26Citations: 4