🤖 AI Summary
Live-streaming recommendation research is hindered by the lack of realistic, dynamic, and publicly available datasets. To address this, we introduce KuaiLive—the first large-scale, real-time live-streaming interaction dataset derived from Kuaishou, encompassing full behavioral logs of 23,000 users and 452,000 streamers over 21 days. It features precise timestamps for stream start/end events and rich multimodal interactions (e.g., clicks, comments, likes, gifts), along with fine-grained user–streamer attributes. KuaiLive uniquely enables dynamic candidate set modeling, multi-task learning—including Top-K recommendation, CTR prediction, viewing duration estimation, and gift amount forecasting—as well as fairness-aware evaluation in live-streaming settings. We conduct a systematic benchmark study on KuaiLive, establishing state-of-the-art baselines across all tasks. The dataset is publicly released to foster empirical advancement in live-streaming recommendation research.
📝 Abstract
Live streaming platforms have become a dominant form of online content consumption, offering dynamically evolving content, real-time interactions, and highly engaging user experiences. These unique characteristics introduce new challenges that differentiate live streaming recommendation from traditional recommendation settings and have garnered increasing attention from industry in recent years. However, research progress in academia has been hindered by the lack of publicly available datasets that accurately reflect the dynamic nature of live streaming environments. To address this gap, we introduce KuaiLive, the first real-time, interactive dataset collected from Kuaishou, a leading live streaming platform in China with over 400 million daily active users. The dataset records the interaction logs of 23,772 users and 452,621 streamers over a 21-day period. Compared to existing datasets, KuaiLive offers several advantages: it includes precise live room start and end timestamps, multiple types of real-time user interactions (click, comment, like, gift), and rich side information features for both users and streamers. These features enable more realistic simulation of dynamic candidate items and better modeling of user and streamer behaviors. We conduct a thorough analysis of KuaiLive from multiple perspectives and evaluate several representative recommendation methods on it, establishing a strong benchmark for future research. KuaiLive can support a wide range of tasks in the live streaming domain, such as top-K recommendation, click-through rate prediction, watch time prediction, and gift price prediction. Moreover, its fine-grained behavioral data also enables research on multi-behavior modeling, multi-task learning, and fairness-aware recommendation. The dataset and related resources are publicly available at https://imgkkk574.github.io/KuaiLive.