MagnetDB: A Longitudinal Torrent Discovery Dataset with IMDb-Matched Movies and TV Shows

📅 2025-01-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the lack of empirical data on the origins, distribution evolution, and user behavior surrounding BitTorrent-based piracy—particularly of films and TV series. To this end, we construct the first large-scale, longitudinal, IMDb-structured supply-side torrent dataset spanning 2018–2024, comprising 28.6 million magnet links and 950 million file-level metadata records. Methodologically, we introduce a distributed DHT crawler, multimodal fuzzy matching (leveraging titles, release years, and aliases), semantic normalization, and spatiotemporal indexing to achieve 89.2% precise IMDb alignment for film/TV torrents. The dataset fills a critical gap in supply-side piracy research, enabling fine-grained time-series analysis, cross-platform provenance tracing, and holistic piracy ecosystem modeling. It has already supported over ten empirical studies on digital copyright enforcement and internet governance.

Technology Category

Application Category

📝 Abstract
BitTorrent remains a prominent channel for illicit distribution of copyrighted material, yet the supply side of such content remains understudied. We introduce MagnetDB, a longitudinal dataset of torrents discovered through the BitTorrent DHT between 2018 and 2024, containing more than 28.6 million torrents and metadata of more than 950 million files. While our primary focus is on enabling research based on the supply of pirated movies and TV shows, the dataset also encompasses other legitimate and illegitimate torrents. By applying IMDb-matching and annotation to movie and TV show torrents, MagnetDB facilitates detailed analyses of pirated content evolution in the BitTorrent network. Researchers can leverage MagnetDB to examine distribution trends, subcultural practices, and the gift economy within piracy ecosystems. Through its scale and temporal scope, MagnetDB presents a unique opportunity for investigating the broader dynamics of BitTorrent and advancing empirical knowledge on digital piracy.
Problem

Research questions and friction points this paper is trying to address.

BitTorrent
Piracy
User Behavior
Innovation

Methods, ideas, or system contributions that make the work stand out.

MagnetDB
BitTorrent Piracy Analysis
Digital Gifting Culture
🔎 Similar Papers
No similar papers found.