Just Another Hour on TikTok: Reverse-engineering unique identifiers to obtain a complete slice of TikTok

📅 2025-04-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses sampling bias in TikTok data collection, arising from API restrictions and the opaque nature of its recommendation algorithm. To overcome these challenges, we propose a reverse-engineering methodology integrating HTTP traffic analysis, ID-space inference, and temporal feature modeling—enabling a platform-wide, high-coverage, and reproducible sampling paradigm. Leveraging distributed crawler orchestration and synchronized video-comment acquisition, our approach captures over 99% of newly posted content within one hour and performs complete hourly time-slice sampling (one-minute duration per hour) across the full 24-hour cycle. We thus construct the first high-quality, temporally resolved dataset encompassing videos, metadata, and comments spanning an entire day. Based on this dataset, we estimate TikTok’s daily global posting volume at 117 million posts—substantially revising upward prior empirical benchmarks. This work establishes a novel methodological foundation for large-scale empirical research on social media platforms.

Technology Category

Application Category

📝 Abstract
TikTok is now a massive platform, and has a deep impact on global events. But for all the preliminary studies done on it, there are still issues with determining fundamental characteristics of the platform. We develop a method to extract a representative sample from a specific time range on TikTok, and use it to collect>99% of posts from a full hour on the platform, alongside a dataset of>99% of posts from a single minute from each hour of a day. Through this, we obtain post metadata, video media data, and comments from a close to complete slice of TikTok. Using this dataset, we report the critical statistics of the platform, notably estimating a total of 117 million posts produced on the day we looked at on TikTok.
Problem

Research questions and friction points this paper is trying to address.

Extracting representative TikTok samples from specific time ranges
Collecting comprehensive post metadata and video data
Estimating critical platform statistics like daily post volume
Innovation

Methods, ideas, or system contributions that make the work stand out.

Reverse-engineering TikTok unique identifiers
Extracting representative time-based samples
Collecting near-complete post metadata and comments
🔎 Similar Papers
No similar papers found.