🤖 AI Summary
This study addresses the inefficiency of preloading strategies on short-form video platforms, where bandwidth is often wasted due to users not watching videos in full. To tackle this issue, the authors present the first publicly available dataset comprising viewing duration behaviors across hundreds of multi-category short videos. They systematically evaluate the performance and stability of several time series models—including Auto-ARIMA, autoregressive (AR) models, linear regression, support vector regression, and decision tree regression—for multi-step prediction of user viewing duration. Experimental results demonstrate that Auto-ARIMA consistently achieves the lowest prediction error and highest stability across most scenarios, significantly outperforming alternative approaches. These findings provide a robust foundation for implementing chunk-based, adaptive preloading mechanisms that align more closely with actual user engagement patterns.
📝 Abstract
Short-form videos have become one of the most popular user-generated content formats nowadays. Popular short-video platforms use a simple streaming approach that preloads one or more videos in the recommendation list in advance. However, this approach results in significant data wastage, as a large portion of the downloaded video data is not used due to the user's early skip behavior. To address this problem, the chunk-based preloading approach has been proposed, where videos are divided into chunks, and preloading is performed in a chunk-based manner to reduce data wastage. To optimize chunk-based preloading, it is important to understand the user's viewing behavior in short-form video streaming. In this paper, we conduct a measurement study to construct a user behavior dataset that contains users' viewing times of one hundred short videos of various categories. Using the dataset, we evaluate the performance of standard time-series forecasting algorithms for predicting user viewing time in short-form video streaming. Our evaluation results show that Auto-ARIMA generally achieves the lowest and most stable forecasting errors across most experimental settings. The remaining methods, including AR, LR, SVR, and DTR, tend to produce higher errors and exhibit lower stability in many cases. The dataset is made publicly available at https://nvduc.github.io/shortvideodataset.