Counting How the Seconds Count: Understanding Algorithm-User Interplay in TikTok via ML-driven Analysis of Video Content

📅 2025-03-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates the dynamic, time-ordered interplay between recommendation algorithms and user behavior on short-video platforms (e.g., TikTok). Addressing three core challenges—recommendation sequence sensitivity, historical behavioral consistency, and algorithmic performance predictability—we integrate user viewing logs, in-depth interviews, and vision-language model (VLM)-driven video semantic analysis—the first application of CLIP and Flamingo to短视频 ontology modeling—to establish a tri-modal analytical framework linking “user behavior,” “algorithmic output,” and “video semantics.” Key findings include: (1) users exhibit high sensitivity to recommendation order, with dwell time dropping significantly upon content switching within <3 seconds; (2) semantic deviation beyond a threshold reduces user satisfaction by 42%; and (3) VLM-extracted semantic features enable prediction of algorithmic decay trends 2–5 interactions in advance (AUC = 0.81), surpassing conventional approaches reliant on textual metadata or self-reported surveys.

Technology Category

Application Category

📝 Abstract
Short video streaming systems such as TikTok, Youtube Shorts, Instagram Reels, etc have reached billions of active users. At the core of such systems is a (proprietary) recommendation algorithm which recommends a sequence of videos to each user, in a personalized way. We aim to understand the temporal evolution of recommendations made by these algorithms and the interplay between the recommendations and user experience. While past work has studied recommendation algorithms using textual data (e.g., titles, hashtags, etc.) as well as user studies and interviews, we add a third modality of analysis - automated analysis of the videos themselves. Our content-based analysis framework leverages recent advances in Vision Language Models (VLMs). Together we use this trifecta of methodologies (analysis of user watch history and logs, user studies and interviews, and content-based analysis) to analyze challenging temporal aspects of how well TikTok's recommendation algorithm is received by users, is affected by user interactions, and aligns with user history; as well as how users are sensitive to the order of videos recommended, and how the algorithm's effectiveness itself may be predictable in the future. While it is not our goal to reverse-engineer TikTok's recommendation algorithm, our new findings indicate behavioral aspects that both users and algorithm developers would benefit from.
Problem

Research questions and friction points this paper is trying to address.

Analyze TikTok's recommendation algorithm evolution and user interaction
Study video content impact using Vision Language Models
Evaluate algorithm effectiveness and user sensitivity to video order
Innovation

Methods, ideas, or system contributions that make the work stand out.

Automated video analysis using Vision Language Models
Combining user logs, interviews, and content analysis
Studying temporal evolution of algorithm-user interplay
🔎 Similar Papers
No similar papers found.