π€ AI Summary
This study addresses the lack of systematic understanding regarding how multimodal features of short videos jointly influence usersβ sensory experiences and behavioral engagement. Grounded in Message Sensory Value (MSV) theory, the work integrates multimodal feature extraction, large-scale human annotation, and computational modeling to develop and validate a cross-platform generalizable engagement prediction model. It reveals, for the first time, an inverted U-shaped relationship between MSV and behavioral engagement, indicating that moderate levels of sensory value maximize user participation. The model demonstrates robust performance across 14,492 unseen videos from three distinct platforms, accurately predicting both sensory stimulation and engagement levels. These findings provide a theoretical foundation and practical tool for optimizing short-video content design and evaluation.
π Abstract
The contemporary media landscape is characterized by sensational short videos. While prior research examines the effects of individual multimodal features, the collective impact of multimodal features on viewer engagement with short videos remains unknown. Grounded in the theoretical framework of Message Sensation Value (MSV), this study develops and tests a computational model of MSV with multimodal feature analysis and human evaluation of 1,200 short videos. This model that predicts sensory and behavioral engagement was further validated across two unseen datasets from three short video platforms (combined N = 14,492). While MSV is positively associated with sensory engagement, it shows an inverted U-shaped relationship with behavioral engagement: Higher MSV elicits stronger sensory stimulation, but moderate MSV optimizes behavioral engagement. This research advances the theoretical understanding of short video engagement and introduces a robust computational tool for short video research.