Just Another Hour on TikTok: Reverse-engineering unique identifiers to obtain a complete slice of TikTok

📅 2025-04-17

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

This study addresses sampling bias in TikTok data collection, arising from API restrictions and the opaque nature of its recommendation algorithm. To overcome these challenges, we propose a reverse-engineering methodology integrating HTTP traffic analysis, ID-space inference, and temporal feature modeling—enabling a platform-wide, high-coverage, and reproducible sampling paradigm. Leveraging distributed crawler orchestration and synchronized video-comment acquisition, our approach captures over 99% of newly posted content within one hour and performs complete hourly time-slice sampling (one-minute duration per hour) across the full 24-hour cycle. We thus construct the first high-quality, temporally resolved dataset encompassing videos, metadata, and comments spanning an entire day. Based on this dataset, we estimate TikTok’s daily global posting volume at 117 million posts—substantially revising upward prior empirical benchmarks. This work establishes a novel methodological foundation for large-scale empirical research on social media platforms.

Technology Category

Application Category

📝 Abstract

TikTok is now a massive platform, and has a deep impact on global events. But for all the preliminary studies done on it, there are still issues with determining fundamental characteristics of the platform. We develop a method to extract a representative sample from a specific time range on TikTok, and use it to collect>99% of posts from a full hour on the platform, alongside a dataset of>99% of posts from a single minute from each hour of a day. Through this, we obtain post metadata, video media data, and comments from a close to complete slice of TikTok. Using this dataset, we report the critical statistics of the platform, notably estimating a total of 117 million posts produced on the day we looked at on TikTok.

Problem

Research questions and friction points this paper is trying to address.

Extracting representative TikTok samples from specific time ranges

Collecting comprehensive post metadata and video data

Estimating critical platform statistics like daily post volume

Innovation

Methods, ideas, or system contributions that make the work stand out.

Reverse-engineering TikTok unique identifiers

Extracting representative time-based samples

Collecting near-complete post metadata and comments

🔎 Similar Papers

Conspiracy theories and where to find them on TikTok